Genetic Diversity, Population Structure Analysis Using Ultra-High Throughput Diversity array Technology (DArTseq) in Different Origin Sesame (Sesamum indicum L)

Sesame is an important oil crop widely cultivated in Africa and Asia continent. Characterization of genetic diversity and population structure of sesame genotypes in these continents can be used to designing breeding methods. In the present study, 300 sesame genotypes comprising 209 local, and 75 exotic collection, and 16 released varieties provided from the Ethiopian Biodiversity Institute and research centers were used in the present study. The panel was genotyped using two ultra-high-throughput diversity array technology (DArT) markers (silicoDArT and SNP). Both markers were used to identify the genetic diversity and population structure of sesame germplasm. A total of 6115 silicoDArT and 6474 SNP markers were reported, of which 5002 silicoDArT and 4638 SNP markers were screening with quality control parameters. The average polymorphic information content values of silicoDArT and SNP markers were 0.07 and 0.08, respectively. For further analysis, the allele frequency for each SNP site was calculated and puried with MAF < 0.01 and left 2997 high-quality SNPs evenly distributed across the whole genome that could be used for subsequent analysis. All genotypes used in this study were descended from eight 8 geographical origins. The genetic diversity analysis showed that the average nucleotide diversity of the panel was 0.14. when North diverse but at the continent level, Asia was more diverse than Africa The genetic distance among the sesame populations was ranged from 0.015 to 0.394, with an average of 0.165. The sesame populations was clustered into four groups. The structure analysis divided the panel into four subgroups and 21 genotypes were clustered as an admixture. These indicates genotypes from the same didn’t the the Information; NGS: Next-generation Sequencing; PIC: Polymorphic Information; PCA: Principal Component Analysis; RE: Restriction Enzyme; RTA: Real-time Analysis; SNNP: Southern Nations, Nationalities, and Peoples’; SNP: Nucleotide Polymorphism.


Background
Sesame (Sesamum indicum L., 2n = 26), a member of the Pedaliaceae family, is one of the most ancient oil crops are grown widely in both tropical and subtropical areas since the time immemorial [1,2].
Archeological ndings revealed that the cultivated sesame traces back its progenitor to the wild populations native to South Asia [3].However, there's still some controversy as to the center of domestication.Some claimed that sesame has been cultivated in South Asia since the time of the Harappan civilization from where it was spread west to the Mesopotamia before 2000 B.C. [3].Others believed that the crop was rst cultivated in Africa and later taken to India at a very early date [4,5].Still, others proposed that sesame was the main oil crop grown by the Indus Valley Civilization from where it was likely transferred to the Mesopotamia around 2500 B.C [6].
Sesame is produced in numerous parts of the world for various purposes but more than 96% of the world sesame seed production is covered by Africa and Asia, India, China, Burma (Myanmar), Sudan, Nigeria, The United Republic of Tanzania, Ethiopia, and Uganda being the key contributors [7].Sesame seeds are good sources of fat, protein, carbohydrates, ber, and essential minerals.Seeds are chemically composed of 44-57% oil, 18-25% protein, and 13-14% carbohydrates [8].Sesame, also referred to as "queen of oilseeds", is employed in sweets such as sesame bars and halva (dessert), and bakery products or milled to get highgrade edible oil [9].Despite the nutritional and economic importance in several parts of the world, however, little focus is given to sesame research be it at the national or the international levels [1,[10][11][12].
In Ethiopia, sesame is among the foremost important oil crops both in terms of area coverage and total national annual production [13].However, the farm level productivity of sesame in Ethiopia isn't only far below the genetic potential of the crop yield of 2 t ha − 1 [14] but also when compared with productivity in other countries like Egypt (1.29 t ha − 1 ), Nigeria (1.1 t ha − 1 ) Tanzania (1 t ha − 1 ), and chain (1.4 t ha − 1 ) [15].Improved varieties released in Ethiopia are reported to yields ranging 0.3 to 1.3 t ha − 1 under rain fed and 1 to 2.4 t ha − 1 under irrigation on research stations [16] and 0.4-1.3t ha − 1 on farmers' elds as reported by Ministry of Agriculture and Rural Development (MoARD) relesed book from 2010 to 2017 [17].
Nevertheless, the national average yield is low (0.68t ha − 1 ) [18].On the other hand, Ethiopia is also considered as one of the centers of genetic diversity an immense wealth of genetic diversity in the germplasm collections for potential exploitation through genetic improvement in future breeding [19].The effective utilization in breeding programs of this germplasm, by and large, depends, among others, on systematic genetic characterization to unveil the magnitude and pattern of genetic diversity available in the germplasm, enables the determination of useful genes and the possible progress that can be made through future breeding.Despite the huge amount of both locally collected and introduced germplasm held in the Ethiopian gene bank and in breeders' stock, information on the use of molecular markers for the characterization of genetic diversity is limited [16].It is well established since the time of N.I.Vavilov [20] that Ethiopian sesame landraces have valuable genetic diversity at the morphological level [21][22][23][24][25].The studies conducted in the past at the molecular level, the eco-geographic distribution and microcenters of the diversity have several limitations as they are based on a few old markers and/or a limited number of genotypes [25][26][27].
More recently high-thorough put marker systems particularly single-nucleotide polymorphisms and Diversity Arrays Technology (DArT) markers have become the genetic markers of choice for genetic analyses including characterization of germplasm because of several comparative advantages like abundant in the genome, e ciency, low cost, and speed [35][36][37][38][39].
Over the last decade, DArT has generated two types of markers, namely silicoDArT and SNP markers.
SilicoDArT markers are dominant microarray markers and scored for the presence or absence of a single allele, whereas DArTseq based SNPs are co-dominant markers, both of them being successfully applied in genetic diversity [40][41][42][43][44] and population structure [45,46] study of several crop species.The present study, which is the rst of its kind to utilize the DArT platforms in sesame, was designed to determine the magnitude and pattern of genetic diversity in a collection of 300 sesame germplasm thereby generate information on the eco-geographic distribution and microcenter of genetic diversity in Ethiopia.

Marker discovery by DArTseq and quality analysis
Through the application of the complexity reduction method, a total of 6115 polymorphic silicoDArT (Table 1) markers were generated of which 5065 were aligned with the Reference sesame genome obtained from (NCBI) and 326 were scaffold, and 724 were unknown markers.Based on updating genome assembly and annotation available at http://ocri-genomics.org/Sinbase_v2.0(genome assembly) and http://ocrigenomics.org/Sin_SNP_430RIL.tar.gz(SNP information).5065 silicoDArT markers were distributed on all 13 chromosomes of sesame (Fig. 1 and Table 1) with an average of 389.62 silicoDArT markers per chromosomes; the maximum number of silicoDArT (643) was found on chromosome 6.The average number of silicoDArT markers/Mbp on all chromosomes was 19.26; chromosome 6 showed the maximum number (24.80), while chromosome 7 had the minimum (13.90).All of the markers (6115) showed ≥ 95% reproducibility (Fig. 3B) and had a call rate value ≥ of 80% (Fig. 4B) with an average value of 90.55 (Table 1).However, low-frequency markers can affect statistical analysis [47].As such, 1113 markers with extremely low one ratio (< 0.05), scaffold as well as unknown markers were not considered in the analysis.In total, 5002 silicoDArT markers cleared all the quality parameters and were selected for the study.Among the 5002 informative markers, around 0.18% were observed in PIC class 0.45 to 0.50 and 71.57% in 0 to 0.05 classes (Fig. 5).PIC values of the remaining markers were distributed (0.2-8.72%) across the rest of the marker group.Therefore, the median (0.027) was located far to the average PIC value of 0.07 and the data exhibited more on one direction distribution.
A total of 6474 SNP markers (Table 1) were also identi ed of which 5821 were aligned with the updated genome assembly and annotation available in http://ocri genomics.org/Sinbase_v2.0(genome assembly) and http://ocri-genomics.org/Sin_SNP_430RIL.tar.gz(SNP information).The other 305 were scaffold and 348 were unknown markers.Based on updated genome assembly and annotation 5821 SNP markers were distributed on all 13 chromosomes of sesame (Fig. 2 and Table 1) with an average of 447.7 SNP markers per chromosomes; the maximum number of SNP (733) was found on chromosome 3.The average number of SNP markers/Mbp on all chromosomes was 22.15; chromosome 3 showed the maximum number (28.36), while chromosome 7 had the minimum (14.20).
SNP markers had an average of 98% reproducibility (Fig. 3A) and an 86% call rate (Fig. 4).100% SNP markers had ≥ 94% reproducibility, of which 3,036 were found to be 100% reproducible (Fig. 3A).The call rate exhibited variance ranging from 40-100%.Around 21.44% of SNP markers displayed a < 75% call rate (Fig. 4A) and were therefore not considered for this study.Similar to the above silicoDArT markers those nucleotide polymorphisms that had missing rates > 0.25 and one ratio ≤ 0.5 had removed and a set of 4638 SNPs was generated.These markers were determined to be highly informative with an average PIC value of 0.08, and 0.07 median.Around 40.40% of markers were in the range (0-0.05)PIC value and only three markers in the highest PIC value range (0.45 to 0.50) (Fig. 5).The remaining PIC value groups increased from the highest range towards the lowest range and these ranging from 0.23 to 23.87% each.
For further analysis the allele frequency for each SNP site was calculated and puri ed with MAF: the MAF of the SNPs varied from 0 to 49.6%, with an average of 5.1%, and ∼61.29% of the SNPs had a low frequency (MAF < 0.05) across the 300 accessions.After excluding the SNPs with a MAF < 0.01, there were left 2997 (∼64.61%)high-quality SNPs (Additional le 2: Table S2) evenly distributed across the whole genome that could be used for subsequent analysis.

Analysis of genetic diversity
The number of accessions, number of alleles, genetic diversity, heterozygosity, and the polymorphism information content (PIC), and major allele frequency of the eight populations are shown in Table 2.The mean PIC values for each SNP locus in sesame collections and introductions from Africa, Amhara, Asia, BG, Improved, Oromia, SNNP and Tigray were 0.18, 0.09, 0.14, 0.06, 0.1, 0.06, 0.06 and 0.12, respectively.When we consider at the continent level, Asia had the highest level of Heterozygosity, PIC, and gene diversity than Africa, but had less allele no and major allele frequency.
Based on the size of the sample and the result of Africa we further portioned into four geographical origins based on direction (North, South, East and West Africa) and compared with Asia collections, then North Africa had the highest level of Heterozygosity, PIC, and gene diversity than other three directions of Africa geographical origin and Asia.Relatively East Africa (Ethiopian) collection had the lowest level of Heterozygosity, PIC, and gene diversity and had the highest level of allele no and major allele frequency.

Genetic relationships among Germplasm
The average genetic distance among 300 sesame accessions and improved varieties was 0.165 and the highest genetic distance (0.394) was calculated between two Oromia landraces "Najjoo-68 (gabaa kamijaa)" and "17712", while the minimum genetic distance (0.015) was found between the Tigray landrace "9694" and one of the African country Egypt landrace "227888" (Additional le 3: Table S3).Genetic diversity among eight pre-grouped populations based on the magnitude of the Nei genetic distance moderate differentiations was revealed between populations from the geographical origin of Asia with Amhara (0.052), BG (0.056), Oromia (0.057) and SNNP(0.073)and the rest pairwise Nei genetic distance between populations from geographical origin revealed lower differentiations (< 0.05) (Table 3).Cluster analysis of the 300 accessions derived from the eight different geographical origins was performed using the allele-sharing distance (ASD) method and the results allowed us to group them into four clusters.
It also helps us to identify accessions and genotypes wrongly assigned to another geographical origin.The rst cluster comprised majority from different countries of Africa (28), all accessions that were introduced from Asia (7) and the different regions of Ethiopia, Amhara (8), Benshangul-Gumz (4), Oromia (10), SNNP (1), Tigray (12), and 7 improved varieties.The second cluster constitutes the highest number of accessions that were collected from the different regions of Ethiopia, Amhara (40), Benshangul-Gumz (34), Oromia (41), SNNP (2), Tigray (23), and 4 improved varieties, the remaining 13 accessions were introduced from different Africa countries.The third cluster is comprised majority from the Tigray region (25) and a small number from Amhara (n = 8), Oromia (1), and 5 Improved varieties, the remaining 4 accessions were introduced from different countries of Africa.Cluster 4 comprised all the accessions that were introduced from one of the African country Egypt ( 23) only (Fig. 6).There were no relationships between cluster grouping and pedigree of accessions and genotypes, although most of the introduced sesame genotypes from Egypt grouped in cluster 4.

An analysis of molecular variance (AMOVA)
Analysis of molecular variance (AMOVA) among the 300 sesame germplasms indicated that 8.31% of the variance was due to genetic differentiation among the populations, 15.24% of the variance was accounted by genetic differentiation among individuals within populations, while the remaining 76.44% of the variance was due to the differences within individuals (Table 4) of the total molecular variation observed was due to differentiation between different directions of Africa and Asia, 10.69% of the variance was accounted by genetic differentiation among individuals within different directions of Africa and Asia, while the remaining 67.12% of the variance was due to the differences within individuals (Table 6).)] increased continuously and in ection was evident when K increased numerically from 1 to 4 (Fig. 7A).Thus, the most likely numerical value of K was 4. The number of subgroups (K) was further validated by the second-order statistics of ∆K.The ∆K value showed a peak at K = 4 (Fig. 7B), which supported the classi cation of the panel into four major subgroups (Fig. 7C).The genetic diversity within each population was explained through the estimation of the expected heterozygosity, which varied from 0.06 (POP2) to 0.31 (POP4).The expected heterozygosity of POP1 was 0.22 and that of POP3 was 0.18.The genetic divergence among the populations revealed by Nei's net nucleotide distance (D) indicated that a higher distance between POP3 and POP4 (0.22) and the genetic distance observed between POP1 and POP2 (D = 0.09) was the least among the pairs of populations.Mean xation index of sub-populations ranged from 0.39 (POP4) to 0.77 (POP2) (Table 7).The PCA was done to further assess the population subdivisions, Principal component analysis (PCA) based on DArTseq -SNP markers revealed four distinct groups of sesame accessions and genotypes and two principal components, accounting for 93.7% of total variation (Fig. 8A).PC1 explained 84% of the genetic variation found, while PC2 explained 7.7% of the variation, respectively.However, some intermediate lines (admixture) made the grouping less than clear-cut.When considering these intermediate lines, the panel could be neatly divided into four clusters (Fig. 8B) corresponding to the four subgroups as inferred by using the STRUCTURE result.

Discussion
To develop sesame varieties with desirable traits, knowledge of the genetic diversity and relationships among germplasm accessions is vitally important.The actual level of genetic variation existing among genotypes at the DNA level re ected by Molecular markers; hence, they provide a more accurate estimate of variation than does either phenotypic or pedigree information [49].
This study based on the suitability of DArT platforms that applied for the genomic dissection of sesame.A total of 6115 silicoDArT markers were developed, of which 5002 markers provided robust information of the sesame genome in the absence of sequence information.On the other hand, DArTseq SNPs provided 6474 informative markers.
The average PIC values of silicoDArT were almost similar to that of SNP markers.The abundance of silicoDArT and SNP markers may achieve better genome coverage through the sampling of a greater number of points in the whole genome, as marker density has a high correlation with gene density [57,58].
Therefore, both silicoDArT and SNP markers may better suit for genetic diversity studies, association/linkage mapping, and/or sequence-based physical mapping in sesame.Additionally, the codominant inheritance pattern of SNP markers may increase the utility of DArT platforms for genetic identity and parentage analysis [59].Providing that the number of individuals with a speci c genotype will be very small, the effect of rare alleles on genome mapping could extend beyond the effect of just small population sizes.In such cases, increasing the number of individuals with rare alleles could improve the ability to check these rare alleles.
The average value of genetic diversity (0.14) was lower in the present study than in the earlier reports for the sesame collections analyzed with SNPs markers [29,32,38,39] and SSR markers [65, 66].However, with the use of 1022 SNP markers that were ltered with a call rate of 97% and > 0.05 MAF similar to the report on [38], the average value of genetic diversity (0.19) was higher than in the earlier reports for the sesame collections analyzed with different markers types [32,38].The broad range of variability among collections might be a source of the differences observed in genetic resources (such as landraces, advanced breeding lines, cultivars, etc.), data ltering methods, sampling approaches, and the number of markers [65].The type of marker is also an important factor for the identi cation of gene diversity; In general, the genetic diversity estimated by SNPs may be lower than those estimated through SSR markers; however, the accurate consideration of genetic diversity re ected the number of loci instead of the number of alleles [38].Therefore, su ciently large numbers of next-generation-based SNPs are analyzed across the genome and are ready to estimate accurate genome-wide diversity in several crop species.
Considering the genotypes based on their geographical origin, Africa (0.21) without the different region of Ethiopian was more diverse than Asia and Different regions of Ethiopia collections, but when we compare at the continent level by including different regions of Ethiopia as Africa, Asia (0.17) was more diverse than Africa (0.14), even if the sample of Asia was little.This nding was expected because the geographical origin of crops generally shows a higher genetic diversity, as reported previously for cotton (Paterson A., 2009) and Oryza ssp.[61].Laurentin and Karlovsky [28] also obtained higher genetic diversity in sesame accessions collected from Asia.
Based on the size of the sample and the result of Africa we further portioned into four geographical origins based on direction (North, South, East and West Africa) and compared with Asia collections, then North Africa collection (0.23) was more diverse than other three directions of Africa geographical origin and Asia also.East Africa (Ethiopian) collection was less diverse than the others.This indicates even if, Ethiopian sesame well known in international market and has its own taste and aroma, it needs a further breeding program to broaden genetic diversity with hybridization and the introduction of a highly diverse collection of North Africa and different countries of Asia.
Distribution of heterozygous sesame genotypes and SNP markers revealed low values of heterozygosity, the average heterozygosity with in sesame panel was 0.1; this suggests that the accessions we used were close to being inbred lines.Hence, the accessions selected were suitable for investigating multiple phenotypic traits in a multi-plot eld test over several years and to also carry out GWAS.
The genetic distance matrix among the sesame populations from 8 geographical origins was also used to construct the clustering tree (Fig. 6).The similarity coe cients ranged from 0.015 to 0.394, with an average of 0.165.The sesame populations could be clustered into four groups.The clustering Dendrogram based on the geographical distribution of accessions showed that the majority sesame accessions from the identical origin didn't classify properly on the premise of the country of origin except those accessions introduced from one of African country Egypt.Similar results were reported previously indifferent sesame germplasm [39,[68][69][70] and in other crops, including wheat [71], nger millet [72], and sorghum [73].The explanation for this unequal distribution of sesame accessions based on the geographical origin may be associated with the gene ow among the various geographical areas due to migrations of people who traded with other regions for a century or who carried seeds for cultivation.
Similarly, Laurentin and Karlovsky [28] found no association between genetic diversity and accession origin, and they proposed that ecological and geographical factors have not played a signi cant role in the evolution of sesame.The present AMOVA analysis also supported the possibility of high rates of gene ow between regions, because the genetic variation among the geographical groups accounted for 8.3% of the total variation and in terms of continents, 11.49% of the total molecular variation among the continents (Table 3).
Most of the genotypes used in this study have been used as parental lines or have a similar genetic background, so a mixture of pedigree observed in all clusters.In our result, the genotypes in Cluster 2 and 3 were collected from different regions of Ethiopia that showed a tendency to cluster together and mostly originating from Ethiopia.This result matches the hypothesis that sesame seeds were dispersed to nearby countries by human activities.Subsequently, these distributed sesame genetic resources were later utilized in further breeding activities to a modern cultivars that were commercialized.
Cluster 1 contained accessions originating from two different continents (Africa and Asia), a close genetic relationship between accessions from East Africa, South Africa, North Africa, and West Africa to the accessions from Asia.This close genetic relationship observed might be due to the introduction of similar sesame genetic stock into many countries and material exchange among widely separated locations [74].
Moreover, the exchange of plant materials between Asia and East Africa dated back to a long time ago and is still occurring [75], with a gentle increase in annual exportation of raw sesame seeds mainly for industrial applications.The likelihood of crossover events between materials from different locations grown within the same area is high, knowing that cross-pollination in sesame has been reported to occur at a frequency between 5% and 60% [66].This crossing could result the similarity of accessions from the eastern a part of Africa and Asia.Similar patterns have also been observed by other researchers [28,69,74].Most of the genotypes used in this study have been used as parental lines or have a similar genetic backgrounds, so a mixture of pedigree observed in all clusters.
Cluster 4 indicates the possibility of genotypes from the same origin those were genotypes observed from one of the African countries Egypt were grouped together (Fig. 6).
Population Structure of the Association-Mapping Panel The complex breeding history of the numerous important crops and also the limited gene ow in most wild plant populations have created complex structures within their germplasms [76].Detailed knowledge about the population structure in an association panel is thus important to avoid any spurious associations [77].
An assessment of structure in sesame has been reported by using different populations.As an example, Ali et al., 2007[68] evaluated 96 sesame accessions, collected from different parts of the world and clustered into just two major groups that discriminated varieties as associated with their geographical origin.And [37] divided 705 sesame accessions into two clusters by employing a neighbor-joining tree.Recently, [38] with the K value of 2 was determined by both the LnP (D) and ∆K.By using a 70% probability of membership threshold, the 366 sesame germplasm was successfully divided into three subgroups (Pop 1, Pop 2, and the Mixed) and [39] divided 95 Mediterranean sesame core collection that contains agromorphologically superior sesame accessions from geographically diverse regions in four continents (Asia, Europe, America, and Africa) into three groups ascertained using STRUCTURE with K = 3.
Similarly, in our study, the K value of 4 determined by both the LnP (D) and ∆K.By employing a 50% probability of membership threshold, the panel was successfully divided into four subgroups (Pop 1, Pop 2, Pop 3, and Pop 4) and the remaining 21 accessions were clustered as an admixture with varying levels of membership shared among the four genetic groups, based on structure analysis.The occurrence of some admixed/hybrid and introgressive hybrid genotypes indicated frequent hybridization and introgression events.Although the extent and signi cance of natural hybridization/introgression are unclear [79], new gene combinations between domestic cultivars and their wild or weedy relatives are important for the evolution of domesticated plant species [80].
The genetic diversity within each population was explained through the estimation of the expected heterozygosity (the average distances between each individual in the same cluster), which varied from 0.06 (POP2) to 0.31 (POP4).The expected heterozygosity of POP1 was 0.22 and that of POP3 was 0.18.The genetic divergence among the populations revealed by Nei's net nucleotide distance (D) indicated that a higher distance between POP3 and POP4 (0.22) and the genetic distance observed between POP1 and POP2 (D = 0.09) was the least among the pairs of populations.Mean xation index of sub-populations ranged from 0.39 (POP4) to 0.77 (POP2) (Table 5).
The population genetic structure re ects interactions among species with regard to their long-term evolutionary history, mutation and recombination, genetic drift, reproductive system, gene ow, and natural selection [81,82].Thus, an understanding of the extent and structure of the genetic diversity of a crop could be a prerequisite for the conservation and e cient use of the germplasm available for breeding [83].
The various approaches (STRUCTURE, PCA, and the clustering tree) used to analyze the structure and relation of the sesame germplasm appeared to provide complementary information.The neighbor-joining tree divided the sesame germplasm into four main clusters which are in complete concordance with the structure and PCA analysis results.These results suggest that the crossing among inter-cluster genotypes may develop cultivars with promising agronomic traits.
According to the AMOVA results, 8.3% and 11.49% of the marker variation was explained among the population from different geographical regions of the sesame panel and differentiation between Asia and Africa population respectively.This result suggests the absence of a complicated population structure in our association-mapping panel.Relatively, 22.17% of the marker variation was explained among the population from different directions of Africa and Asia, this suggests the presence of certain complicated between population structure in different directions of Africa and Asia association-mapping panel.
In this study, most collections (225) were from Ethiopia and a speci c collection was from West, South, and North Africa and seven collections were from 4 Asia countries.Ethiopian sesame has useful characteristics, and often branded as 'Humera', 'Gondar' and 'Welega' types, well known in the world market by their white color, sweet taste and aroma.The Humera and Gondar sesame seeds are suitable for bakery and confectionary purposes and the high oil content of the Welega sesame seed gives a major advantage for edible oil production [84].Collections that were introduced from a different direction of Africa and Asia were accustomed to compare the degree of genetic relationship and differentiation among genetic resources of Ethiopian collection, which broadens genetic diversity can also be used to combine alleles for valuable agricultural traits [86].The SNPs obtained from this collection could bene t future breeding and association mapping work in sesame.Our diversity analysis of this collection revealed genetic relationships among the accessions that may be valuable for parental selection in sesame improvement research.Therefore, the identi cation of genetically distant accessions (such as Najjoo-68 (gabaa kamijaa) and 17712) for hybridization in sesame breeding programs has the potential to lead to the development of elite varieties.Even based on economical traits and the distance we got from SNPs, we can further select a number of accessions for the different breeding programs.

Conclusions
The present research showed the effectiveness of DArTseq in characterizing the genetic diversity and population structure of sesame collection.
, and Tigray (60) and 16 well popularized and currently released sesame varieties between 1942 and 2014.Among 75 exotic collections, 68 were introduced in seven African countries in four directional geographical origins; North Africa (27), South Africa (18), West Africa (17), and the remaining 6 were in East Africa without including the Ethiopian collection.The other 7 were collection introduced in 4 Asian countries.These were kindly provided by the Ethiopian Biodiversity Institute (EBI) and regional and federal research centers.The sampling sites covered a wide range of natural eco-geographical locations and the description presented in (Fig. 9 and Additional le 1: Table S1).

DNA extraction
The seed yield of each sesame genotype was harvested when the plants got matured.Seeds of each genotype were placed in the 2 mm deep wheel tube together with 1 steel ball that had 30 mm diameter and crushed with Geno/Grinder followed by mixing 800 ul Lysis buffer to the sample of each genotype powder for tan bead DNA extraction process.Those samples were incubated for 1 hour at 65°c then centrifuge for 5 minutes to remove plant tissue debris.The lysate was taken and load on column #1 and the nucleic acid of the samples was extracted with Automated Nucleic acid Extractor (Maelstrom series).During the process, the silicon dioxide layer coated on the magnetic beads adsorb nucleic acid from samples, remove contaminants with wash Buffer, and elute puri ed genomic DNA by Elution Buffer.At the end of the program, collected Nucleic acid was found in column #6 with a clean tube.DNA quality was evaluated on 0.8% agarose gels and it was adjusted to 50 ng/µl for GBS analysis.

GBS library preparation and sequencing
DArTseq combines genome complexity reduction methods and next-generation sequencing platforms [87][88][89][90].Therefore, DArTseq represents a new implementation of the sequencing of complexity-reduced representations [91] and more recent applications of this concept on the next-generation sequencing platforms [92,93].DArTseq libraries (96-plex) were prepared for the 312 accessions using 50 ng of DNA each.Brie y, DNA samples were digested individually with PstI-MseI restriction enzymes.In this technology, the PstI-based complexity reduction method [40] was applied for the enrichment of genomic representation with single-copy sequences.This method involved the digestion of DNA samples with a rare cutting enzyme PstI, paired with a set of secondary frequently cutting restriction enzyme MseI, ligation with site-speci c adapters, and ampli cation of adapter-ligated fragments.Post digestion with a PstI-MseI pair, a PstI overhang compatible oligonucleotide adapter (5 -CAC GAT GGA TCC AGT GCA-3 annealed with 5 -CTG GAT CCA TCG TGC A-3 ) was ligated, and the adapter-ligated fragments were ampli ed in adherence to the prescribed standard procedures [40].Post-PCR, cluster generation was carried out in cBOT (Illumina) according to the procedures described by the manufacturer.Brie y: 10 nM DNA of each library is denatured, diluted in hybridization buffer, loaded into the machine, and clusters are generated in the ow cell by cBOT with use of the set cBOT reagents (Bridge Ampli cation).During cluster generation, the molecules of each library were attached to the ow cell surface and ampli ed to form clonal clusters.
Next-generation sequencing technology was implemented using the sequencer HiSeq2500 (Illumina, USA) to detect SNPs and silicoDArT markers.The ow cell with clusters generated in the previous step (cBOT) is loaded to the HiSeq 2500 together with the sequencing reagents.HiSeq 2500 performed sequencing according to user-selected sequencing parameters.All amplicons were sequenced in a single lane.The single-read sequencing was run for 77 cycles.
Real-time Analysis (RTA) happened simultaneously to the sequencing run and RTA data were outputted to a server.The main sequence outputted data were base calling les *.bcl les.These les were the input les for downstream data conversion.The primary work ow was a custom build software for downstream processing of *.bcl les.The rst step was a conversion of *.bcl les which was done by Illumina bc12fastq software embedded in the primary work ow, the second step performed two functions at the same time: rst using target de nition from DArTdb the software splits sequencing reads according to the barcode sequence (demultiplexing), Secondly, it removed reads below quality lters.Two quality lters were applied: more stringent for barcode sequence and less stringent for the remaining part of the sequencing read.
Finally fold compression of the sequence tags was copied to DArTdb (Diversity Arrays Technology data base, Australia) for permanent storage.We extracted compressed sequence tags from DArTdb and load them to DArTsoft14 for marker data extraction.DArTsoft14 extracts two types of marker data: SNPs and SilicoDArTs.SilicoDArTs represent dominant markers and is scored in a binary format "1"= Presence and "0"= Absence of restriction fragment with the marker sequence in the genomic representation of the sample."-" represents calls with non-zero counts but too low to score con dently as "1" (often representing heterozygotes).Single Nucleotide Polymorphism (SNPs) can be de ned as a variation in the base composition of a single nucleotide position within a speci c locus of a single chromosome of the haploid set.In standard format, SNPs markers were presented for reference and SNP alleles for each marker and genotype.This format of SNPs can be converted to other formats if required.The report was prepared as binary or read counts le, or both depending on the order speci cations.Two technical replicates of the DNA samples of each of 21 accessions were genotyped to calculate the reproducibility of the marker data.Thereafter, the SNPs and SilicoDArTs obtained were run against the sesame reference genome database (http://ocri genomics.org/Sinbase/login.htm.) to understand on which chromosomes of sesame the SNPs and SilicoDArTs were located.

Quality analysis of marker data
The markers were tested for reproducibility (%), call rate (%), polymorphism information content (PIC) and, one ratio.Scoring of reproducibility involved the proportion of technical replicate assay pairs for which the marker score exhibited consistency.The call rate determined the success of reading the marker sequence across the samples and was estimated from the percentage of samples for which the score was either '0' or '1'.PIC is the degree of diversity of the marker in the population and showed the usefulness of the marker for linkage analysis.One ratio constitutes the proportion of the samples for which genotype scores equaled '1'.

Data analysis
DArTseq markers were mapped using the consensus map version 4.0 (www.diversityarrays.com)developed by DArT Pty. Ltd., Australia, and the updated genome assembly and annotation issued from the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, available online at (http://ocri-genomics.org/Sinbase_v2.0(genome assembly) and http://ocri-genomics.org/ Sin_SNP_430RIL.tar.gz(SNP information).DArTseq raw data were ltered according to markers criterion; minor allele frequency > 0.01% and missing data ≤ 25%.The summary statistics of the ltered DArTseq markers such as the expected heterozygosity (He) or genetic diversity (GD), minor allele frequency (MAF), and the polymorphic information content (PIC), were calculated using Power Marker v 3.25 [95].PIC was estimated based on the probability of nding

Figure 3 A
Figure 3

Figure 4 A
Figure 4

Figure 7 Analysis
Figure 7

Figure 9 Map
Figure 9

Table 1
Distribution of DArTseq markers on different sesame chromosomes.
*Indicates the chromosomal size taken from the reference genome published by Wang et al

Table 2
Summary of the genetic diversity of the 300 sesame accessions based on their different geographical regions

Table 3
Pairwise population Nei's genetic distance showing the magnitude of genetic differentiation between sesame populations from different sources

Table 4
When we see further, In terms of population subdivision with different directions of Africa and Asia 22.17%

Table 6
AMOVA between the Different directions of Africa and Asia SNPs was selected for analysis of the population structure.The hierarchical population structure was determined for the entire panel via the Bayesian model-based analysis using the STRUCTURE program.As K changed from 1 to 11 by inferring on Delta K ofEvanno et al. [48], the loglikelihood value [LnP(D

Table 7
(20)tic divergence among (net nucleotide distance) and within (expected heterozygosity) population, proportion of membership, and mean value of Fst observed from the study of the population structure of 300 sesame accessions and genotypes using DArTseq-SNP markers Mixed) with 50% levels of membership shared among the three genetic groups (Additional le 5: TableS5).mostaccessions of Pop 1 introduced from different countries of Africa(27), 7 accessions from different Asia countries, while 18 accessions in total came from different regions of Ethiopia, Amhara (n = 4), Benshangul-Gumz (2), Oromia (5), Tigray(7)and 2 improved varieties.The accessions and genotypes of Pop 2 constitute the largest that was mainly collected from the different regions of Ethiopia, Amhara (n = 40), Benshangul-Gumz(35), Oromia(42), SNNP(2), Tigray(20), and 7 Improved varieties, the remaining 13 accessions were introduced from different Africa countries.The accessions of Pop 3 comprised mainly from three regions of Ethiopia, Amhara (n = 9), Oromia (1), Tigray (26), and 4 Improved varieties, the remaining 3 accessions were introduced from different countries of Africa.Pop 4 introduced from one of the African countries Egypt (23) only.For the Mixed group, 19 accessions were collected from different regions of Ethiopia and 2 accessions from two Africa countries.
When using a probability of membership threshold of 50%, 54, 159, 43, and 23 accessions were respectively assigned into the four subgroups, Pop 1, Pop 2, Pop 3, and Pop 4, while the remaining 21 accessions were classi ed into a mixed subgroup ( In comparison with the other existing marker technologies like microsatellite markers, DArT markers are pertinent to high-throughput work and have merits in terms of costeffectiveness and time aspect[60].The effectiveness of silicoDArT and SNP markers varies depending on the type of application.For genetic diversity and linkage mapping a large number of silicoDArT markers are suitable.However, for genetic identity and product quality testing, both markers can perform equally.Due to the opportunity to track alleles from parental genotypes, the co-dominant SNP markers are more suitable in plant identity and parentage analysis than silicoDArT.Then, 2997 SNP markers were ltered with a call rate of 75%, and those having > 0.01 minor allele frequency were used for the analysis, The proportion of rare SNPs (i.e., MAF < 0.05) we examined amounted to ∼61.29% %, which was similar to those reported for the genomes of sesame[38].In our study, a high proportion of rare SNPs have two explanations.Firstly, since the SNPs were identi ed via DArTseq conducted by GBS technology, providing a broad genome coverage, they should be less prone to bias than would be low-coverage sequencing data[61].Secondly, in following its recent program to conserve genetic resources, a signi cant number of minor sesame varieties have been collected and preserved by Ethiopian Biodiversity and research centers.The SNPs with a MAF < 0.05 were removed in several previous studies[62, 63].However, rare SNPs might also have control over the expression of a particular phenotype [64].
The gene diversity values calculated based on the 2997 SNPs and 300 genotypes suggest that among continents, Asia, out of the different directions of Africa, North Africa is relatively genetically diverse.And even if Ethiopian sesame has useful characteristics and has its own aroma and tests, the collection of East Africa /Ethiopia is less diverse and need further crossing and introduction of germplasm for creating variability that favor improvement for different biotic and abiotic stress.The local and exotic collection provide useful genetic data for future molecular-based studies.This study also supports the idea; ecological and geographical factors less effective in the evolution of sesame.This nding provides guidance to the systematic utilization and conservation of the genetic resource and indicates the further collection of sesame genotypes from these different origins.
To develop SNP and silicoDArT markers, the DArTseq technology was optimized using replacing a single PstI-compatible adapter with two different adapters corresponding to two different restriction enzymes (RE) overhangs.The PstI-compatible adapter was designed to include Illumina owcell attachment sequence, sequencing primer sequence, and staggered varying length barcode regions.The reverse adapter contained the owcell attachment region and MseI- [94]atible overhang sequence.Only "mixed fragments" (PstI-MseI) were effectively ampli ed in 30 rounds of PCR using the following reaction conditions: 1 min at 94•C for initial denaturation; 30 cycles each consisting of 20 s at 94•C for denaturation, 30 s at 58•C for annealing and 45 s at 72•C for the extension, and nally a 7 min extension step at 72•C.The genomic representations were generated following the procedures described by Kilian et al[94].