Compilation of an informative microsatellite set for genetic characterisation of East African finger millet (Eleusine coracana)

article i nfo Background: Genetic diversity of finger millet (Eleusine coracana), a nutritious neglected staple cereal in Africa and South Asia is largely uncharacterized. This study analysed 82 published SSR markers for finger millet across 10 diverse accessions to compile an informative set for genetic characterisation. Extensive optimization compared single samples with bulked leaf or bulked DNA samples for capturing within accession genetic diversity. The markers were evaluated to determine (1) how effi ciently they amplified target loci during high-throughput genotyping with a generic PCR protocol, (2) ease of scoring PCR products and (3) polymorphism and ability to discern genetic diversity within the tested finger millet germplasm. Results: Across 88 samples, the 52 markers that worked well amplified 274 alleles, ranging from 2 to 14 per locus with a mean of 4.89. Major allele frequency ranged from 0.18 to 0.93 with a mean of 0.57. Polymorphic Information Content (PIC) ranged from 0.13 to 0.88 with a mean of 0.5 and availability varied between 64 and 100% with a mean of 92.8%. Heterozygosity ranged from 0 to 1.0, with a mean of 0.26. Discussion: Five individual samples from an accession captured the largest number of alleles per locus compared to the four different bulked sampling strategies but this difference was not significant. The identified set comprised 20 markers: UGEP24, UGEP53, UGEP84, UGEP27, UGEP98, UGEP95, UGEP64, UGEP33, UGEP67, UGEP106, UGEP110, UGEP57, UGEP96, UGEP66, UGEP46, UGEP79, UGEP20, UGEP12, UGEP73 and UGEP5 and was since used to assess East African finger millet genetic diversity in two separate studies.


Introduction
Finger millet (Eleusine coracana 2n = 4x = 36) is a nutritious cereal staple that originated in the tropical and sub-tropical parts of Africa and Asia. It is adapted to a wide range of adverse agro-ecological conditions with minimal inputs and provides critical plant genetic resources upon which millions depend for food and rural household incomes on infertile and marginal lands [1]. Finger millet ranks third in cereal production in the semi-arid regions of the world after sorghum and pearl millet [2] but remains one of the neglected and underutilized crops of Africa even though it is extensively cultivated, mostly by subsistence farmers. It serves as a food security crop because of its high nutritional value and excellent storage qualities [3]. The seeds contain up to 14% protein and are particularly rich in methionine, iron, and calcium and in marginal and medium agricultural zones in Ethiopia, Eritrea and Uganda; this crop is a high priority staple [4].
To date, a limited number of SSR markers have been reported for finger millet. In this study, we analysed the 82 markers developed by Dida et al. [3] for their usefulness in discerning the genetic diversity in finger millet, using a selection of 10 diverse cultivated finger millet accessions from the mini-core collection developed by Upadhyaya et al. [5].
An extensive optimization study was undertaken to determine if DNA from more than one individual can be bulked in a single sample in order to reduce the number of samples needed for genotyping a single accession. Bulking would allow for the cost-effective analysis of a larger number of genotypes, rather than analysing multiple individuals from a single accession. This is a feasible approach for finger millet as the inherent nature of the crop as an allotetraploid, is 99% inbred [6], and in general it behaves like a diploid crop species, with 2 homozygous alleles per locus and little variation within an accession on the DNA level expected. Alternatively we wanted to determine if a single sample from an accession is adequate to capture within accession diversity. To this end, the 82 SSR markers were evaluated on how well they worked in a high-throughput genotyping system, specifically in terms of the use of a generic PCR protocol, ease of scoring PCR products, polymorphism and ability to discern genetic diversity within the tested finger millet germplasm.

DNA extraction
Ten diverse finger millet accessions were selected to represent the diverse agro-ecologies where finger millet is grown in its main area of cultivation in eastern Africa in Kenya, Uganda and Tanzania, listed in Table 1. They comprised of accessions GBK-000414A, GBK-011135A and GBK-044047A from Kenya obtained through the National Genebank of Kenya (NGBK), Sansamula and Namakonta from Tanzania, Ebega, Emorumoru and Bulo from Uganda and IE2572 and IE2957 from the international mini-core collection for finger millet [5]. For each accession, DNA was extracted from fresh leaves of 14-d-old seedlings of 5 individuals to prepare 9 different samples for genotyping. These comprised of DNA from each of the 5 individuals separately (samples 1 to 5) as well as equal sized leaf samples from 3 individuals bulked (3 LBsample 6) or leaves from 5 individuals bulked (5 LBsample 7) in a single sample. In addition, extracted DNA, normalized at 10 ng/μL post-extraction, was pooled from 3 individuals (3 DBsample 8) and 5 individuals (5 DBsample 9). DNA extraction was done according to the modified CTAB protocol of Mace et al. [7], omitting the phenol:chloroform step. Extracted DNA was visualised on a 0.8% (w/v) agarose gel and quantified spectrophotometrically using a Nanodrop® 1000 (Thermo Scientific, USA), followed by dilution to 10 ng/μL in TE buffer (10 mM Tris, 0.1 mM EDTA pH 8.0).

SSR genotyping optimization
The 9 different DNA samples described above for each accession (90 samples in all) were subjected to genotyping using 82 published SSR markers for finger millet [3]. All forward primers contained an M13-tag (5′-CACGACGTTGTAAAACGAC-3′) on the 5′ end that allowed incorporation of a fluorescent label during PCR to allow detection of amplification products [8]. PCR amplification was performed in 10 μL in 384 well microtitre plates and each reaction comprised of 1 × PCR buffer (20 mM Tris-HCl, pH 7.6; 100 mM KCl; 0.1 mM EDTA; 1 mM DTT; 0.5% (w/v) Triton X-100; 50% (v/v) glycerol), 2 mM MgCl 2 , 0.16 mM dNTPs, 0.16 μM fluorescent labelled M13-forward primer, 0.04 μM forward primer, 0.2 μM reverse primer, 0.2 units of Taq DNA polymerase (SibEnzyme Ltd, Russia) and 30 ng of template DNA. PCR reactions were performed on a GeneAmp 9700 thermocycler (Applied Biosystems) with initial denaturation of 94°C for 5 min, followed by 35 cycles of 94°C for 30 s, 59°C for 1 min and 72°C for 2 min, followed by final elongation at 72°C for 20 min. Amplification was confirmed by running 4 μL of the products on a 2% (w/v) agarose gel stained with GelRed® (Biotium, USA) and visualised under UV light.
Amplification products (1.5 μL-3.5 μL of each) were co-loaded in sets of 3 to 4 markers together with the internal size standard, GeneScan™-500 LIZ® (Applied Biosystems) and Hi-Di™ Formamide (Applied Biosystems) and separated by capillary electrophoresis using an ABI Prism® 3730 Genetic analyzer (Applied Biosystems). Allele calling was performed with Gene Mapper 4.0 (Applied Biosystems) and allelic data for each marker analysed with PowerMarker V3.25 [9] and Arlequin [10]. DARwin V5 [11] was used to generate unweighted neighbour-joining dendrograms from the allelic data of the 52 SSRs that worked well as well as a subset of 20 markers that was selected for use in further genetic diversity studies on finger millet. This was done to confirm that the selected subset could discern genetic diversity as effectively as the larger subset of 53 markers.

DNA extraction and PCR genotyping
Good quality and amounts of DNA, confirmed by 0.8% (w/v) agarose gel electrophoresis and spectrophotometry were obtained for all the samples and all except 408_4 and 5_4 worked well in subsequent PCR. These two samples were removed from the dataset so that the final number analysed was 88. Generally, PCR products were 19 bp longer than the expected amplified fragment size due to the incorporation of the 5′-M13 tag [8].
Across the 88 samples, the total number of alleles amplified by the 52 markers was 274, ranging from 2 to 14 alleles per locus with a mean of 4.89. The major allele frequency ranged from 0.18 to 0.93 with a mean of 0.57. The Polymorphic Information Contents (PIC) ranged from 0.13 to 0.88 with a mean of 0.5 across all loci and availability varied between 64 and 100% with a mean of 92.8%. Heterozygosity ranged from 0 to 0.94, with a mean of 0.25. Details of these results are presented in Table 2.

Bulk sample strategy
Bulking of samples was compared to results from individual samples using Arlequin software [10] and the results are presented in Table 3. Individual samples, where data from the 5 individuals per accession were combined, detected the largest number of alleles per locus 95% of the time. All of the pooling strategies detected the largest number of alleles less frequently than the combined individual samples. Bulked DNA samples were more effective than the leaf bulk strategy. Bulked DNA from 3 individuals (3 DB) detected the largest number of alleles 53% of the time but generally detected 86% of the all the alleles that individual samples did. DNA from 5 samples bulked (5 DB) and bulked leaf samples from 3 individuals (3 LB) detected the largest number of alleles 50% of the time (87 and 86% of all alleles detected by individual samples, respectively) and 5 leaf samples bulked (5 LB) did so 43% of the time, detecting 83% of the alleles that individual samples did.
The various sampling strategies were also compared for the effectiveness of detecting the major allele within each accession using PowerMarker [9] and all strategies detected the major allele between

Phylogenetic analysis
Fig. 1 depicts the unweighted neighbour-joining dendrograms of the allelic data from the 52 SSRs ( Fig. 1a) that worked well as well as the subset of 20 markers (Fig. 1b) that was selected for use in further genetic diversity studies on finger millet. The samples from each of the ten accessions were grouped into tight individual clusters in both dendrograms, with the exception of Acc 408_2 in Fig. 1a that was loosely associated with Acc 386.

Discussion
The PCR protocol reported here is used routinely in the laboratory at ICRISAT-Nairobi and generally works well for high throughput SSR genotyping with sorghum, pigeonpea and groundnuts, even though it did not work well for 16 of the 82 markers tested in this study. However, since it is important to have a standardised protocol that works well across all DNA samples, it was decided to focus only on the Table 2 Details of the 52 markers, arranged according to PIC that worked well in this study and was assessed to compile a genotyping tool kit. The first 20 markers comprise the tool kit.

Marker
Repeat markers that worked well with this protocol. Further elimination of the 13 monomorphic markers and one highly heterozygous and heterogeneous marker, resulted in a total of 52 markers that were assessed, of which four (UGEP5, UGEP51, UGEP95 and UGEP103) were scored as amplifying duplicate loci. This subset of 52 markers (listed in Table 2), amplifying 56 loci, was used to identify the best SSR kit for diversity assessment.
This study set out to identify a set of 20 SSR markers since reports on genetic diversity studies used sets of 15 to 50 SSR markers for germplasm collections ranging from a few hundred to thousands of accessions such as for sorghum (41 SSRs, 3367 accessions) [12] as well as for sets developed for global germplasm diversity assessment by the Generation Challenge Programme (GCP) for barley (15 SSRs, 2692 accessions), chickpea (35 SSRs, 3000 accessions), coconut (30 Table 3 Summary of sample by marker, indicating which bulked samples captured the largest numbers of alleles that were contributed by the sum of all the individuals within an accession. SSRs, 201 accessions), groundnuts (21 SSRs, 911 accessions), pearl millet (21 SSRs, 1000 accessions), pigeonpea (20 SSRs, 952 accessions) and pearl millet (20 SSRs, 1000 accessions) [13,14]. The sets with larger numbers of markers were developed for crops with relatively abundant genomic resources such as sorghum and chickpea compared to the more neglected and orphan crops such as pearl millet and pigeonpea, for which smaller sets of about 20 markers were used, likely because fewer markers were available for these crops to select from, as was also the case in this study with finger millet. Table 2 indicates that across all samples, the total number of alleles amplified was 274 (2 to 14 alleles per marker, mean of 4.89). This was in line with the value reported by Arya et al. [15] of a mean of 4 alleles per locus for 17 SSRs but lower than the 6.42 for 79 SSRs for cultivated and wild Eleusine species from Africa and India reported by Dida et al. [16]. The major allele frequency, indicating how widely the dominant allele amplified by a particular marker is spread across the samples, ranged from 0.18 to 0.93 (0.57 mean) and PIC ranged from 0.13 to 0.88 (0.5 mean) across all markers. Gene diversity values obtained in this study (Mean of 0.55, ranging from 0.14 to 0.89) was also comparable with that obtained by Arya et al. [15] (mean of 0.47, ranging from 0 to 0.73) and Dida et al. [16] (mean of 0.47). Availability, an indication of how well a marker worked, ranged from 64 to 100% with a mean of 92.8%. These criteria were used to identify the 20 most suitable markers for genetic diversity assessment for finger millet.

Bulk sample strategy
Bulking of samples allow more cost-effective analysis of a larger number of genotypes, compared to multiple individuals from a single accession. This was considered a feasible approach for finger millet, which is 99% inbred and behaves like a diploid species [6], with 2 homozygous alleles per locus. Likewise, it was also considered whether a single sample from an accession could detect the predominant within accession diversity. Genotyping data for all samples were scored as diploid, selecting the 2 most prominent alleles in each sample. This strategy determined that, for bulked samples, private and rare alleles were largely eliminated and it also meant that these samples could not detect all the alleles in every sample.
Results showed that, as expected, individual samples (data from 5 individuals per accession combined) generally captured the largest number of alleles per locus 95% of the time and therefore displayed the maximum genetic diversity per accession. When considering the results from the bulked samples presented in Table 3, bulked DNA samples were marginally more effective than bulked leaves (86-87% compared to 83-86%) to capture the same number of alleles per locus as the combined 5 individual samples. All samples often exhibited fewer than the total no of alleles per accession due to the exclusion of rare and unique alleles from the dataset during both allele scoring and data curation, as explained above.
When the 5 individual samples from an accession was compared, on average each presented the major allele 93% of the time and bulked samples did so 95% (3 DB), 96% (5 DB), 98% (5 LB) and 100% (3 LB) of the time. Since we considered only the 2 most prominent alleles in each sample, the alleles that were excluded when bulked samples were compared to the individual samples, were the rare alleles that would not have contributed to the genetic diversity estimates within the population. 3 DB were found to be the best strategy to detect the maximum number of alleles and 3 LB to capture the major allele most often compared to individual samples, but these differences are not statistically meaningful. Therefore, depending on the available resources, either individual plants or bulked DNA samples can be used.
In bulked samples, it was noted that a minority of individuals (1 or 2) from an accession could present an allele different to that of the majority in a group of 5. This complicated the scoring of the data since multiple alleles, often in odd numbers confound the data for a diploid or homozygous tetraploid crop. For the most part in this study, such a unique allele in an accession presented as a smaller peak in the GeneMapper profile as it was not equally contributed by all individuals and did not complicate the selection of the 2 major alleles. However, in some cases the PCR step may have amplified one locus preferentially compared to the others but since this could not be inferred from the GeneMapper profiles, in such cases, the two major alleles (highest peaks) were selected. For some markers (UGEP5, UGEP51, UGEP95 and UGEP103) there were clearly more than 2 alleles in the majority of samples, even for single samples. In these cases it was most likely due to either a heterozygous or duplicate locus that was amplified in this tetraploid crop. For these markers all genotypes were assessed to confirm that the multiple alleles occurred consistently. When this was the case, as for UGEP5, UGEP51, UGEP95 and UGEP103, the marker data were split and evaluated as two individual loci. However, when very few individuals presented such multiple alleles, a decision had to be made on which 2 alleles were the major alleles to be included in the data set. To ensure a robust set of markers, wherever possible, such ambiguous markers were avoided.

SSR tool kit identification
The allelic data from the selected 56 loci were analysed using PowerMarker V3.25 [9] and were ranked according to (1) their ability to discern amongst the different genotypes using the polymorphic information content (PIC), (2) availability, which indicated how often the marker worked across the 88 samples analysed, (3) number of alleles presented per marker and (4) the frequency of the major allele as main criteria. Highly heterozygous and heterogeneous markers were avoided. The best 20 SSR markers for finger millet genotyping are indicated in Table 2. A phylogenetic analysis was carried out with the allelic data for all 52 markers as well as for the 20 markers selected as the "genotyping kit", presented in Fig. 1. As expected, the topography of the two dendrograms looked similar, indicating that the selected 20 markers represent the data from the entire set of 52 markers. Therefore, genetic analysis using the sub-set of 20 markers should capture the same genetic diversity as the 52 markers. It was interesting to note that this list differed substantially from the set of markers prescribed by the GCP Bioinformatics Central Registry [13] for finger millet genotyping (http://gcpcr.grinfo.net/index.php?app= datasets&inc=files_list). For the germplasm analysed in this study, several of the GCP markers were found to be either monomorphic, exhibited low PIC or did not work well.

Conclusions
The results from this study present the best set of microsatellite markers that could be assembled from the currently available finger millet SSR primers in the public domain that can be used for genetic diversity assessment of finger millet germplasm. One obvious shortcoming was that very few of these markers have been mapped [3] and therefore it was not clear how well they were spread across the genome as compared to the SSR reference set recently reported by Billot et al. [12]. Also, there were very few markers to work with from the outset, emphasizing the need for further investment in finger millet genomics to allow additional marker-and genetic resource development for this important orphan crop. This set of markers was subsequently used successfully to assess the genetic diversity amongst 340 finger millet accessions from Kenya, Tanzania and Uganda as reported by Manyasa et al. [17] as well as for 76 accessions comprising of cultivated (E. coracana subsp. coracana) and its wild relatives Eleusine intermedia, Eleusine indica, Eleusine multiflora and Eleusine floccifolia from Ethiopia [18].

Financial support
This study formed part of the project "Delivering New Sorghum and Finger Millet Innovations for Food Security and Improving Livelihoods in Eastern Africa" supported by the Swedish International Development Agency (SIDA) (Project no: 01/2010) Bio-Innovate programme.