Genetic relationship and diversity in a sesame (Sesamum indicum L.) germplasm collection using amplified fragment length polymorphism (AFLP)

Background Sesame is an important oil crop in tropical and subtropical areas. Despite its nutritional value and historic and cultural importance, the research on sesame has been scarce, particularly as far as its genetic diversity is concerned. The aims of the present study were to clarify genetic relationships among 32 sesame accessions from the Venezuelan Germplasm Collection, which represents genotypes from five diversity centres (India, Africa, China-Korea-Japan, Central Asia and Western Asia), and to determine the association between geographical origin and genetic diversity using amplified fragment length polymorphism (AFLP). Results Large genetic variability was found within the germplasm collection. A total of 457 AFLP markers were recorded, 93 % of them being polymorphic. The Jaccard similarity coefficient ranged from 0.38 to 0.85 between pairs of accessions. The UPGMA dendrogram grouped 25 of 32 accessions in two robust clusters, but it has not revealed any association between genotype and geographical origin. Indian, African and Chinese-Korean-Japanese accessions were distributed throughout the dendrogram. A similar pattern was obtained using principal coordinates analysis. Genetic diversity studies considering five groups of accessions according to the geographic origin detected that only 20 % of the total diversity was due to diversity among groups using Nei's coefficient of population differentiation. Similarly, only 5% of the total diversity was attributed to differences among groups by the analysis of molecular variance (AMOVA). This small but significant difference was explained by the fact that the Central Asia group had a lower genetic variation than the other diversity centres studied. Conclusion We found that our sesame collection was genetically very variable and did not show an association between geographical origin and AFLP patterns. This result suggests that there was considerable gene flow among diversity centres. Future germplasm collection strategies should focus on sampling a large number of plants. Covering many diversity centres is less important because each centre represents a major part of the total diversity in sesame, Central Asia centre being the only exception. The same recommendation holds for the choice of parents for segregant populations used in breeding projects. The traditional assumption that selecting genotypes of different geographical origin will maximize the diversity available to a breeding project does not hold in sesame.


Background
Sesame (Sesamum indicum L.) is one of the most ancient crops [1]. It is grown in tropical and subtropical areas [2] on 6.5 million hectares worldwide, producing more than three million tons of seed [3]. India, Sudan, Myanmar and China are the most important sesame producers with 68 % of the world production. Sesame seed, which is highly nutritive (50% oil and 25% protein), is traditionally used for direct consumption and as a source of oil of excellent quality due to the presence of natural antioxidants such as sesamin and sesamol [4]. Potentially beneficial effects of sesame on human health have recently renewed the interest in this ancient crop.
The aim of the present study was to clarify genetic relationships among 32 sesame accessions from the Venezuelan Germplasm Collection, which represents genotypes from 5 geographical regions, and to determine the relationship between geographical distribution and genetic diversity.

AFLP results
A total of 457 AFLP markers were recorded using 8 primer combinations on 32 sesame accessions. Ninety-three percent of markers were polymorphic (Table 1). Fifty-nine percent of the markers ranged from 100 to 300 nucleotides in size. Forty-seven bands (10.3 %) were unique, 25 belonging to African accessions, 10 to Indian accessions, 8 to China-Korean-Japan accessions, 3 to Central-Asian accessions and 1 to Western-Asian accession.

Phenetic analysis
Jaccard's similarity coefficients ranged from 0.38 (between one accession from India and one from Korea) to 0.85 (between one accession from Turkey and one from Syria), with an average of 0.65. Within each geographical region sampled similarity coefficients were 0.59 for Africa, 0.61 for China-Korea-Japan, 0.63 for India, 0.68 for Western Asia and 0.80 for Central Asia. Figure 1 displays a UPGMA dendrogram obtained using similarity coefficients. Two robust groups were identified at a similarity value of 0.65 by bootstraping (bootstrap values 90% and 93%). These clusters included 25 of 32 accessions used in the analysis. The cophenetic correlation coefficient (0.95) indicated little distortion between the original similarity values from the similarity matrix and the values used to construct the dendrogram. Furthermore, the standard deviation for the two main clusters was less than 4% (see legend to Figure 1). Figure 2 displays the location of the same 32 accessions on a bidimensional space of principal coordinates analysis, which represented 74 % of total variation among accessions.  Table 2 summarises estimated Nei's parameters related to genetic diversity, showing that only 20% of the total variation in allele frequencies corresponds to differences among groups. Average diversity within groups (H ST ) ranged between 0.14 for Central Asia and 0.21 for Africa. Genetic distances among groups are very low (Table 3). Central Asia accessions had the lowest probability of sharing the same allele frequencies in all sampled loci with the other geographical regions. Table 4 shows the analysis of molecular variance. It indicates that 5% of the variance among the 32 AFLP patterns was due to differences among groups and 95% was due to differences within groups. However, despite the small value for variation among groups, it was statistically significant (P < 0.05). Table 5 displays that this finding can be explained by a large difference between the variation of Central Asia accessions with respect to the other diversity centres, since all of them, and only they, were significant (P < 0.05).

Discussion
Sesamum indicum L. has a large genetic variability, which should be taken into account when planning conservation strategies or when sesame variability is used in breeding programs. This high level of polymorphism in sesame has been reported before for its morphology [8,9]. Early molecular studies have not confirmed this. Isozyme studies concluded that cultivated sesame has a narrow genetic base [10,11], However, the number of loci sampled in these studies were limited and enzymes represent merely coding sequences. Furthermore, all synonymous and many non-synonymous mutations are not detected with isozyme analysis [27].
A RAPD-based study on sesame carried out on 36 Indian accessions and 22 accessions from other countries [12] and a study on a Turkish sesame collection [13] concluded that sesame has a high level of genetic variability. An ISSR-based study on Korean accessions and cultivars from 12 countries found a low level of polymorphism in this particular marker, but the authors cautioned that their method had low resolution and problems with visualization [14]. Generally, methods based on arbitrarily primed PCR tend to reveal a higher degree of genetic variability as compared to other methods [28].  Table 2 and 3, especially the G ST value of 0.20, support the lack of association between geographical origin and population differentiation. Due to AFLP are dominant markers, heterozygocity cannot be directly observed. Therefore three methods are available for the calculation of allele frequencies from dominant marker data: a classical approach based on Hardy-Weinberg assumption, which we used in this work (see Material and Methods for details), a modification of the classical approach by the exclusion of loci with three or less recessive homozygotes [29], and a Bayesian approach [30]. Comparison of allele frequencies calculations from these three methods on AFLP data resulted essentially identical; furthermore potential biases in the estimation of null allele frequency are largely eliminated in highly polymorphic dominant marker data [31]. Confidence in our calculations of Nei's parameters are supported by this finding.
However, the interpretation of results obtained using Nei's parameters should be used cautiously. H T is called "average heterozygosis" when it is calculated from data on co-dominant markers and "average genetic diversity" or "heterogeneity" when it is calculated from data on dominant markers. Heterogeneity values might overestimate Biplot of principal coordinates analysis for 32 sesame accessions Figure 2 Biplot of principal coordinates analysis for 32 sesame accessions.
the number of real loci, whereas the number of alleles per locus is underestimated. Thus, heterogeneity measures have only relative value and cannot be compared with values obtained from other molecular markers [32]. Additionally, reliable estimation of average heterozygosis with small number of individual are based on large number of loci (>50) and low average heterozygosis (<0.1) [33]. Our results indicate a heterogeneity higher than 0.1, however, we are more interested in the partition of variation (within and between groups of accessions) than in the value by itself.
AMOVA results shown in Table 4, support the lack of association between geographical origin and population differentiation as well. AMOVA reported differences among geographical regions, which were significant but represented only 5% of the total variation. Table 5 shows that the differences between Central Asian accessions and the other centres were responsible for this part of the variation. These results, together with the fact that all Central Asian accessions were grouped in one cluster, indicate a narrow variation within this centre as compared to the variance in AFLP patterns of the whole germplasm collection. Furthermore, Central Asian accessions had the lowest H ST value (0.14) though it had more polymorphic loci than Western Asia accessions. A possible explanation of this result is that allele frequencies in Central Asia were close to 0 or 1, which could be a consequence of genetic drift. If a strong genetic flow was the cause of the lack of association between geographical origin and genetic differences in sesame, the genetic flow to Central Asia in recent time was limited.
This lack of association between geographical distribution and classification based on molecular markers in sesame was explained by the exchange of sesame among widely separated locations [14]. However, this study used predominantly commercial cultivars, which resulted from a systematic selection process on material of unreported origin, while the origin of material used in our study is known.
The genetic variability in Indian accessions of sesame is high [6], as shown both for molecular [12] and morphological markers [9]. Sesame seems to have been domesticated in India [5], which could explain the high genetic variability among Indian accessions. In our study, African and Chinese-Korean-Japanese accessions showed as high a genetic variability as Indian accession.
Africa has also been considered the origin of sesame [2,34], based on the fact that most of the wild Sesamum species are endemic there. Reports on the variability of cultivated sesame in Africa are controversial, claiming both low [12] and high [ [35] and our results] level of variability. Some authors consider Abysinia (Ethiopia) as the basic diversity centre for sesame [6,7].
Accessions from China, Korea and Japan, which we grouped into one diversity centre, have been studied separately by some authors. A unique allele was found in Chinese-Japanese accessions in a study on isozymes [11]. China is considered a secondary centre of diversity for sesame [7]. Genetic variability among Korean landraces is higher than among Korean cultivars [14].   [13], but the relationship to accessions from other geographical region has not been investigated. We analysed only three accessions from Western Asia. Remarkably, the highest similarity found among all 32 accessions corresponded to similarity values between two of these accessions (one from Syria and one from Turkey). The third accession was not closely related to these.
Distribution of genetic diversity in a plant species depends on its evolution and breeding system, ecological and geographical factors and often on human activities [36]. Cross-pollination may play a role, because it can reach up to 60% in sesame, depending on the presence of suitable insects at flowering time [12]. Ecological and geographical factors obviously have not played an important role in the evolution of sesame since we have not found any association between genetic diversity and accession origin.
Sesame growers have been manipulating the crop due to migration and trade for centuries, causing a steady gene flow among different geographical areas. The oldest remnants of sesame, found in the Harappa valley in the Indian subcontinent [1], date the origin of these activities to at least 5500 BP. Therefore, we believe that the most important factor affecting the current genetic structure in sesame were human activities.

Conclusion
AFLP analysis revealed a high degree of genetic polymorphism in sesame accessions within all diversity centres except Central Asia. Phenetic analysis has not shown any association between geographic origin and AFLP patterns. According to Nei's diversity indexes, 80 % of the total genetic diversity in sesame is represented within diversity centres. This result was corroborated by analysis of molecular variance (AMOVA) which indicated that 95 % of the variation among accessions were due to variation within diversity centres. These results suggest that conservation strategies do not need to cover all diversity centres as long as they sample a sufficient number of accessions. Similarly, choosing parent genotypes for breeding programs from many diversity centres as compared to sampling just one centre (except Central Asia) is not likely to increase the variability among progeny significantly. Regardless of how many diversity centres are sampled, both conservation strategies and breeding programs would benefit from using AFLP or another genome fingerprinting technique to maximise the genetic variability covered by the selected genotype set.

Plant material
Thirty-two accessions from Centro Nacional de Investigaciones Agropecuarias (CENIAP) Germplasm Bank (Table  6) were grown in the greenhouse. These accessions originate from five different geographical regions representing the proposed diversity centres for sesame [6,7], and the geographical areas included in the germplasm bank; they were chosen randomly within each geographical region, using more accessions for the two proposed origin centres Fixation index (Fst) = 0.0514 (India and Africa). The accessions were grouped into one of the following diversity centres: India, Africa, China-Korea-Japan, Central Asia and Western Asia.

DNA extraction
Three grams of apical young leaves from 6 plants per accession were collected and used for DNA extraction. Leaves were ground in liquid nitrogen and tissue powder was dispersed in CTAB buffer (2.3 g sorbitol, 1 g N-laurylsarcosine, 0.8 g CTAB, 4.7 g sodium chloride, and 1 g polyvinylpolypyrolidone in total volume of 100 ml of 20 mM EDTA, 10 mM Tris, pH set to 8.0) containing 0.4 mg proteinase K and 20 µL mercaptoethanol. The homogenates were incubated for 10 min at 42°C and 10 min at 65°C, cooled to room temperature and extracted with 8 ml of chloroform-isoamylalcohol (24:1). Phases were separated by centrifugation for 10 min at 12000 RCF (relative centrifugal force or g value). Polyethyleneglycol (PEG6000, SERVA Electrophoresis, Germany) stock solution (30%) was added to the aqueous phase to a final concentration of 6 %, mixed, and after 30 min of incubation at room temperature the precipitated DNA was sedimented by centrifugation for 20 min at 12,000 RCF. Pellets were washed twice with 70% ethanol and dissolved in 200 µL TE buffer (10 mM Tris/HCl pH 8.0, 0.1 mM EDTA). 500 µL of 5 M ammonium acetate solution were added and samples were kept at 0°C for 30 min, centrifuged for 30 min at 4°C and 18000 RCF. 500 µL of isopropanol were added to the supernatant and DNA was precipitated for 10 min at room temperature. Samples were centrifuged at 18000 RCF at room temperature for 10 min; pellets were washed twice with 70% ethanol, dried and dissolved in 200 µL of TE buffer. DNA concentration was determined by electrophoresis in a 0.8 % agarose gel with lambda DNA standard.

AFLP analysis
AFLP analysis was performed as originally proposed [37] with minor modifications [38]. 250 ng of DNA were used for each reaction. DNA was digested with 10 U EcoRI and 3 U of Tru1I (both entzymes from MBI Fermentas, Germany) in buffer recommended by the manufacturer in a total volume of 15µl at 37°C for 90 min, followed by 90 min at 65°C. 10 µl of a solution with a final concentration

Statistical analysis
Bands were automatically recognised by GelCompar II using threshold values of 5 % of profiling (relative to the maximum value within each lane). Band matching was performed and the results were exported as a binary matrix. It was used to study the phenetic relationship among AFLP patterns by means of cluster analysis (Gel-Compar II) and an ordination analysis, specifically principal coordinates, using the software NTSySpc 2.11T [39]. Jaccard's similarity coefficient and the unweighted pair group method with arithmetic mean (UPGMA) were used to perform the clustering analysis. This was tested with three statistical significance tests, also using GelCompar II: the Bootstrap analysis [40] for the assessment of the robustness of dendrogram topology, the standard deviation of the cluster nodes, and cophenetic correlation as an estimation of the faithfulness of cluster analysis [41]. Firstly, bootstraping analysis was carried out, and we tried to find robust groups at the same similarity level and finally we calculated the standard deviation for these groups. Dendrogram-derived similarities were compared with experimental similarities to get cophenetic correlation.
To study the genetic structure of Sesamum indicum L. species, the accessions were grouped in five sets according to the geographical distribution. Gene diversity indices such as total diversity (H T ), average diversity within group (H ST ), diversity among groups (D ST ) and coefficient of population differentiation (G ST ) [42] were calculated for each band and then averaged for the total set. Heterozygocity cannot be directly observed in AFLP data because AFLP markers are dominant. To calculate allele frequencies, the absence of a band was considered as homozygous state of a recessive allele (q 2 ) and presence of a band as either dominant homozygote (p 2 ) or a heterozygous state (2 pq). Frequencies p and q are calculated accordingly. Also unbiased measures of genetic identity and genetic distance between groups were calculated [33]. All Nei's parameters, which use gene frequencies, were calculated using Popgene v. 1.32 software. To get another approach on the genetic structure with no assumed gene frequencies, analysis of molecular variance (AMOVA) [43] was carried out using Arlequin v. 2.000 software, to estimate variance components for the AFLP patterns and to parti-