Genetic diversity and population structure of Ethiopian Capsicum germplasms

We established a collection of 142 Capsicum genotypes from different geographical areas of Ethiopia with the aim of capturing genetic diversity. Morphological traits and high-resolution melting analysis distinguished one Capsicum baccatum, nine Capsicum frutescens and 132 Capsicum annuum accessions in the collection. Measurement of plant growth parameters revealed variation between germplasms in parameters including plant height, stem thickness, internode length, number of side branches, fruit width, and fruit length. Broad-sense heritability was maximum for fruit weight, followed by length and width of leaves. We used genotyping by sequencing (GBS) to identify single-nucleotide polymorphisms (SNPs) in the panel of 142 Capsicum germplasms and found 2,831,791 genome-wide SNP markers. Among these, we selected 53,284 high-quality SNPs and used them to estimate the level of genetic diversity, population structure, and phylogenetic relationships. From model-based ancestry analysis, the phylogenetic tree and principal-coordinate analysis (PCoA), we identified two distinct genetic populations: one comprising 132 C. annuum accessions and the other comprising the nine C. frutescens accessions. GWAS analysis detected 509 SNP markers that were significantly associated with fruit-, stem- and leaf-related traits. This is the first comprehensive report of the analysis of genetic variation in Ethiopian Capsicum species involving a large number of accessions. The results will help breeders utilize the germplasm collection to improve existing commercial cultivars.


Introduction
Members of the genus Capsicum in the Solanaceae family, commonly known as chilli peppers, are major crop plants and are almost cosmopolitan in distribution [1]. Chilli pepper fruits are used as spices, as vegetables and for medicinal purpose [2] and are a significant source of Vitamins A and C. They are also used as natural coloring agents, cosmetics and active ingredient in host defense repellents. Some are also used as ornamentals [3,4]. The genus includes 27 species, of which five are known to be domesticated [5]. The five cultivated species of Capsicum, namely C. annuum L., C. chinense Jacq., C. frutescens L., C. baccatum L. and C. pubescens Ruiz et Pav., represent the most economically important vegetables worldwide [6]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 effectiveness and to the ease with which the resulting data can be converted to universal genotype information from different technological sources [4]. Genotyping by sequencing (GBS) is a genome-wide reduced representation of SNPs obtained using Illumina sequencing technology [18]. The use of restriction enzymes in GBS reduces genome complexity by avoiding the sequencing of repetitive regions, resulting in more straightforward bioinformatics analysis for large genomes [19,20]. It is thus a rapid, high-throughput, genome-wide and cost-effective tool for SNP discovery [18]. It is helpful for genotyping without prior knowledge about the genome of the species and is useful for exploring plant genetic diversity on a genome-wide scale [4]. In the last few years, GBS has been used to investigate the genetic diversity of many crop species, including maize, rice, barley, tomato, wheat, sorghum, soybean, watermelon and Capsicum [21][22][23][24][25][26][27][28].
The present study was undertaken to characterize Capsicum germplasms collected from different localities of the six regions of Ethiopia using morphological and molecular markers to explore the genetic diversity available in a wide collection of germplasm. The data presented herein may be useful to understand the diversity of Capsicum in Ethiopia and use the information for the breeding purpose. Similarly, our finding may give additional insight into the quantitative trait loci controlling fruit weight.

Plant materials
The germplasm collection of 142 genotypes used in this study was obtained from the Ethiopian Biodiversity Institute (EBI). These germplasms were collected from different pepper-growing areas of the country: 47 from eight zones of Amhara (11˚39 0 38.88@ N, 37˚57 0 28.08@ E); five from the Metekel zone of Benishangul Gumuz (10˚20 0 0@ N, 34˚40 0 0@ E); 38 from eight zones of Oromia (7˚59 0 20.62@ N, 39˚22 0 52.25@ E); 40 from five zones of the Southern Nations, Nationalities and Peoples (SNNPs) region (6˚3 0 31.03@ N, 36˚43 0 38.28@ E); one from the Jigjiga area of Somali (7˚26 0 19.43@ N, 44˚17 0 48.75@ E); and two from different weredas (districts) of Tigray (14˚8 0 11.68@ N, 38˚18 0 33.58@ E) (Fig 1). Ten germplasms used in this experiment have no accession passport data and are thought to be recent introductions. Among the germplasms, 42 were classified into particular species of Capsicum by the EBI based on the descriptor: four as C. frutescens and 38 as C. annuum. According to the germplasm descriptors, the EBI collected peppers in Ethiopia during 16 different years since the first germplasm record from the Limu wereda of the Oromia region in 1978. The majority of Capsicum germplasm collection (59%) was done between 1984 and 1990, during which most geographical areas with significant pepper cultivation were covered. In this experiment, three plants of each of the 142 germplasms were grown under greenhouse conditions at Biotong Seed Co. Ltd., Anseong, Republic of Korea, in 2017. Seeds were disinfected using 2% sodium chlorate and 10% trisodium phosphate.

Morphological characterization and statistical analysis
For each germplasm, 12 growth traits and nine flower-, 17 fruit-and two seed-related traits were evaluated according to the descriptions used by the Rural Development Administration (RDA) gene bank, South Korea, with some modifications (S1 Table). Morphological traits such as plant type (PT), plant height (PH), plant width (PW), main stem length (MSL), internode length (INL), number of side branches (NSB), stem thickness (ST), stem color (SC), leaf color (LC), and leaf length and width (LL and LW) were recorded 114 days after sowing. One representative flower from each plant was assessed for stamen number (SN), filament color (FC), anther color (AC), petal color (PC), petal length (PL), petal width (PW), petal number (PN) and calyx shape (CS) (S1 Fig).
Tomato analyzer version 3.0 was used to measure fruit perimeter (FP), fruit area (FA), fruit width mid height (FWMH), maximum fruit width (MFW), fruit height mid-width (FHMW) and fruit curved height (FCH) as previously described [29]. A Spearman's rank correlation coefficient was calculated among all the variables, including the altitude at which the germplasms were collected. Kaiser-Meyer-Olkin (KMO) and Bartlett's tests were performed using SPSS software 22.0 to measure sampling adequacy and sphericity, respectively [30] (S1 Table).
The phenotypic (PCV) and genotypic (GCV) coefficients of variation were estimated as percentages of the corresponding phenotypic and genotypic standard deviations from the trait grand means as used by Khan et al. [31]. Estimates of broad-sense heritability in percent were obtained using the formula suggested by Burthon and de Vane [32].

HRM-PCR amplification and data analysis
To the identify the species of 142 Capsicum germplasms, high-resolution melting (HRM) was performed as described previously [15,33]. A Rotor-Gene 6000 real-time PCR thermocycler (Corbett Research, Sydney, Australia) was used with the following PCR amplification conditions: 95˚C for 10 min; 50 cycles of 94˚C for 20 s, 55˚C for 20 s, and 72˚C for 40 s; 95˚C for 60 s; and 40˚C for 60 s. For HRM analysis, an increase of 0.1˚C temperature per minute from 65˚C to 90˚C was used. A combination of five markers (S5 Table) developed previously was implemented for species identification [15]. Five reference materials, viz. C. annuum, C. chinense, C. frutescens, C. chacoense and C. baccatum, were used.

DNA extraction and library construction for genotyping by sequencing
Two or three young leaves from each germplasm were used as sources of DNA. Total DNA was extracted using the modified cetyl trimethylammonium bromide (CTAB) method as described previously [34]. The concentration and purity of DNA samples were determined with a NanoDrop 1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). DNA samples with absorbance ratios above 8 at 260/280 nm were used for analysis [15], and gel electrophoresis was conducted on a 0.8% agarose gel.
Two GBS libraries were constructed based on a modified protocol as used previously [19,35] using a two-enzyme system, PstI (rare cutter) and MseI (frequent cutter). Each GBS library was a 96-plex library consisting of 60 and 82 samples from the Capsicum diversity panel (S2 Table).

Sequencing data analysis, SNP identification and genome-wide association analysis
Sequencing was performed with an Illumina HiSeq 2500 (Macrogen Inc., Seoul, Korea). Data analysis and SNP identification were performed as described previously [36]. Raw reads were de-multiplexed in accordance with individual barcodes, and the adapter and barcode sequences were removed using commercially available CLC genomic workbench software (version 6.5). Trimmed reads were mapped to CM334 chromosome version 1.6 (Pepper.v.1.6. total.chr.fa) by Burrows-Wheeler Aligner (BWA) [37]. The SAMtools program was used to group and sort the reads by chromosomal order [38]. The Genome Analysis Toolkit (GATK) program was used to call SNPs over whole chromosomes [39]. From the 142 total germplasms, 12 were omitted from SNP-based analysis due to significant loss of reads.
The 53,284 filtered SNPs were used for genome-wide association (GWAS) mapping. The default settings of Genomic Association and Prediction Integrated Tool of the R package were used to estimate GWAS based on the compressed mixed linear model [40]. SNPs with a calling rate of more than 0.1 were retained and FILLIN in TASSEL was used for imputation. R 2 values and imputed ratio of minor and major alleles were used to select suitable imputed quality. A final filtering was performed based on minor allele frequency of more than 0.05, SNP coverage >0.6 and inbreeding coefficient >0.8. The P values of SNPs from GWAS were subjected to a false-discovery rate (FDR) analysis, and Bonferroni correction was done to reduce false-negative results from the GWAS analysis. A significance threshold level at a P value of 0.05 was set after Bonferroni multiple-test correction.

Genetic diversity and population structure analysis
For each SNP, polymorphic information content (PIC), heterozygosity (H2), gene diversity, genotype number, allele number and allele frequency were calculated using Power Marker software [17], and the genetic diversity for the entire set of Capsicum genotypes as well as the geographically based subpopulations were also identified by PowerMaker version 3.25. To investigate the population structure, assess genetic diversity and remove near-duplicates (i.e., highly similar genotypes), both parametric and non-parametric approaches were used. Pairwise geographic distances between accessions, pairwise F ST between accessions in the different groups and analysis of molecular variances (AMOVAs) were calculated using GenAlEx 6.503, with 999 permutations for testing variance components [41].
Population structure was estimated from 13,998 selected SNPs from the 53,284 polymorphic SNPs used in GWAS analysis that can easily be handled by the software used for population structure analysis. Population structure was determined using STRUCTURE software (http://pritch.bsd.uchicago.edu/structure.html) [42], which was run from the command line using the admixture model, a burn-in period length of 10,000 and 10,000 Markov-chain Monte Carlo (MCMC) iterations after burn-in. Ten independent runs were performed for each K from K = 1 to K = 5. The best number of K was chosen with the DeltaK method [24] by running the STRUCTURE HARVESTER software [43].

Phylogenetic and principal-coordinate analyses
Phylogenetic trees were produced using genotyping data with 53,284 SNP markers using both the unweighted neighbor-joining method and the hierarchical cladding method based on the dissimilarity matrix calculated with Manhattan index, as implemented in the DARwin software (version 6.0.9) [44], and was visualized with Dendroscope (version 3.5.9) [45]. Inkscape 0.92 was used to make annotations and to apply visual effects to the phylogenetic tree. Principal-coordinate analyses (PCoA) were performed with GenAlEx version 6.503 [41].

Species identification based on HRM genotyping
In addition to the preexisting identification of some of the germplasms by EBI, and the assessment of morphological features [15] and SNP information (Figs 1 and 2 and S1 Table), we obtained further confirmation of species through a real-time HRM-PCR protocol as described earlier [33]. We used five high-resolution melting markers (HRMs) to assign germplasms to different species (S5 Table) [15]. The 142 germplasms were classified into three species, C. annuum, C. frutescens and C. baccatum. The four markers C2_At5g19560, C2_At5g50020, Waxy and PepTrn showed a polymorphism that was highly specific for the three species (S2 Fig). Eighty-six C. annuum accessions were identified by the C2_At5g50020 marker, of which 12 were further confirmed by both Waxy and PepTrn, 28 by Waxy and 33 by PepTrn. The remaining 46 C. annuum accessions were identified by the single markers C2_At5g19560 (two), C2_At5g50020 (13), Waxy (eight) and PepTrn (32). Confirmation of C. baccatum status was made on the basis of the specific melting curve shape using PepTrn and C2_At5g50020 markers. C. frutescens accessions were likewise identified on the basis of melting curve shape using the markers Waxy and PepTrn. C2_At5g50020 also identified two C. frutescens accessions (SNU-142 and SNU-143).

Qualitative and quantitative morphological characterization
We evaluated the qualitative properties of plants, flowers, leaves and fruits based on descriptions used by Rural Development Administration (RDA) gene bank of South Korea. Plant habits were spreading (12.9%), half-spreading (59.7%), erect (26.6%) and fasciculate 0.7%). Spreading types were collected from Amhara, Oromia and SNNPs at altitudes ranging between 1000 and 2570 meters above sea level (m.a.s.l.), while the altitude range for half-spreading plant types was 1150-2780 m.a.s.l. distributed in different locations of Amhara, Benishangul Gumuz, Oromia, SNNPs, Somalia and Tigray. A relatively narrower altitude range (1200-2060 m.a.s.l.) was observed for the erect Capsicum types from Amhara, Oromia, SNNPs and Tigray. The only fasciculate Capsicum type was obtained from the Bibugn wereda of Amhara at an altitude of 1850 m.a.s.l. (S1 Table).
Measurement of plant growth parameters revealed variation between germplasms. Plant height (48-175 cm), stem thickness (6-33 mm), internode length (4-18.5 cm) and number of side branches (4-30) all evidenced variation. The seven tallest germplasms (140-175 cm in height) were collected from SNNPs at a mean altitude of 1300 m.a.s.l. Short to medium-height germplasms were collected from all Capsicum growing regions, with a wider altitude range (1000-2570 m.a.s.l.). Wider variation in stem thickness was observed in C. annuum (6.32-33 mm) than in C. frutescens (7.2-15.7 mm). Mean stem thickness for C. baccatum was 23.3 mm. The maximum mean internode length (11.4 cm) was measured for C. frutescens, while the minimum (9.23 cm) average internode length was observed for C. annuum. Similarly, the maximum average number of side branches below first node (17, with a range of 12-21) was counted in C. frutescens, followed by C. annuum (15, with a range of 4-30) (S7 Table, S1 Table and S3 Fig).
Estimates of phenotypic (PCVs) and genotypic coefficients of variation (GCVs), broadsense heritability and genetic advance are shown in Table 1. Across the traits studied, the PCV values ranged from 39.6% for plant width to 99.6% for fruit weight. Similar to the latter, PCV values were high for number of seeds per fruit, fruit length, fruit width, pericarp thickness, fruit perimeter, internode length, leaf length and main stem length, with respective values of 81.5%, 65.6%, 58.7%, 58.3%, 53.3%, 53.0%, 52.4% and 50.6%. In contrast, number of side branches, leaf length, pedicel length, plant height, stem thickness and plant width showed comparatively lower PCV values (<50%). The GCV estimates were lowest (31.9%) for plant width and highest (98.6%) for fruit weight. High GCV values were also recorded for seeds per fruit, leaf width and fruit length. However, relatively low GCV values (<50%) were recorded for fruit width, pericarp thickness, fruit perimeter, internode length, number of side branch, leaf length, main stem length, pedicel length, plant height and stem thickness. Broad-sense heritability was greatest for fruit weight (98.1%), followed by length and width of leaves with respective values of 77.7% and 77.1%. The heritability values of the remaining traits ranged from 62.7 to 75.9%. Genetic advance (GA) as percentage of the mean ranged from 1.7% to 107.7% for plant height and pericarp thickness, respectively (Table 1).

GBS and single-nucleotide polymorphisms
We identified 2,831,791 genome-wide SNPs in our germplasm panel. We filtered these by removing rare alleles (with prevalence less than 5%), alleles with high missing ratios (absent from more than 30% of the germplasms) and alleles with high heterozygosity (more than 80%). To explore the genetic diversity of the panel, we analyzed all the germplasms using 53,284 high-quality SNPs. The chromosomal distribution and proportion of polymorphic markers used for the competition is shown in S6 Table. We mapped the SNP density (number of SNPs per Mbp) and their distribution across the 12 chromosomes (Fig 2) and found that SNP densities varied across chromosomes. There was relatively high uniformity on chromosomes 3, 4, 5, 6, 8, 9, 10 and 12. On chromosome 2, a higher SNP distribution was found towards one end, and it was higher on both arms of chromosome 11. Relatively more SNPs were also recorded around the middle and one end of chromosome 7.

Genetic diversity
The amount and organization of genetic diversity among the model-based populations is presented in Table 2. A high allele number was recorded in C2 (5.00), and the mean major allele frequency was greater in C1 (0.30) than in C2 (0.28). The average expected heterozygosity (a measure of genetic diversity), observed heterozygosity and polymorphic information content (PIC; denoting allelic diversity and frequency) values were 0.74, 0.02 and 0.69 for C1 and 0.75, 0.03 and 0.70 for C2, respectively. The PIC value was 0.692 in 35 C. annuum and 0.701 in six

Analysis of molecular variance
To quantify the genetic diversity within and among subpopulations, we partitioned the total molecular variance into two clades according to the STRUCTURE simulation result. Averaged across the 130 germplasms, 92% of the total genetic diversity was partitioned between germplasms within the subpopulations, and only 8% was attributed to differences at the individual level (Table 3).

Population structure
We inferred the population structure of the 130 Ethiopian Capsicum germplasms using the program STRUCTURE 2.3.4 [46]. We carried out admixture model-based simulations by varying K from 1 to 5 with 10 iterations using 130 germplasms. The estimated likelihood (lnP (D)) was greatest for K = 2 (Fig 3), suggesting the presence of two main populations in the Capsicum germplasm panel [43]. The classification of germplasms into populations based on the model-based structure from STRUCTURE 2.3.4 (Fig 3 and S3 Table) showed that subpopulation C1 comprised 123 germplasms and subpopulation C2 comprised the remaining 7 germplasms. We tested the genetic variation within the subpopulations using the fixation index (F st ) statistic for genetic differentiation. We observed a low average distance (H E ) between individuals in the same clade in subpopulation C1 (0.05) and a higher H E in subpopulation C2 (0.07). The F ST values for subpopulations C1 and C2 were 0.713 and 0.850, respectively.

Molecular phylogenetic and principal-coordinate analysis
The unrooted phylogenetic tree with two clades is consistent with the model-based population structure, in which C. frutescens germplasms were grouped separately from C. annuum (Fig 4). Clade 1 contained C. annuum accessions growing in an altitude range between 1000 and 2780 m.a.s.l. and consisting mainly of germplasms collected from different growing localities of the SNNPs region (35 germplasms), Amhara (47 germplasms), Oromia (37 germplasms), Benishangul (5 germplasms), Somali (1 germplasm) and Tigray (2 germplasms), which accounted for 24%, 33%, 26%, 3.5%, 0.7% and 1.4% of germplasms, respectively. The growing altitude range of C. frutescens (1200-1310 m.a.s.l.) was narrow compared with that of C. annuum (Fig 4). We also performed PCoA on 130 germplasms (Fig 4). This analysis largely supported the separation of the germplasms into two subpopulations fairly well distributed on the axes, with one variation as indicated by an arrow. Cluster A consisted mainly of C. annuum, a pattern also evidenced in the model-based genetic clustering using STRUCTURE and the phylogenetic tree. Germplasms in Cluster B were all from C. frutescens.

GWAS for selected traits
Genome-wide association result on fruit weight is summarized by Manhattan plots in Fig 5. With a Bonferroni correction threshold of 5% (-log 10 (P > 6.03), the number of markers linked to various traits varied from a maximum of 187 for fruit length to a minimum of one for fruit shape index, leaf color, petal length, petal width and stem color (S8 Table). A total of 509 significant SNPs were identified, 81.53% of which were for fruit traits, 10.61% for leaf traits, 0.39% for petal length and width and 7.47% for stem-related traits. The largest fraction of significant SNPs (26.68%) was detected on chromosome 3, followed by chromosomes 8 and 9 with 16.11% and 13.56%, respectively. The smallest concentration of significant SNP markers was observed on chromosome 12 for fruit length, fruit number, fruit weight and petal width. SNP markers related to fruit traits (area, color, length, width, number, weight, shape-index, number of locules, pericarp thickness and perimeter) were distributed across all 12 chromosomes. While SNP markers for stem-related traits (internode length, hairiness, thickness, color and branching) are distributed on chromosomes 1, 2, 3, 4, 6, 9, 10 and 11, those for leaf-related traits (length, width and color) are localized on chromosomes 2, 3, 6, 7 and 9 (S4 Fig).
For fruit weight (Fig 5), two regions containing 12 SNPs were detected on chromosome 3, from 126.3 to 144.9 Mbps and from 223.02 to 223.04 Mbps, and one region was detected on  (Table 4).

Discussion
The greater the genetic diversity of germplasm, the greater is the chance of success in breeding desirable strains. Knowledge of population structure and genetic diversity is essential for association mapping studies, genomic selection and the classification of individual genotypes into different groups. In the present study, we classified Ethiopian Capsicum germplasms into different species and analyzed their genetic diversity. According to our classification, the majority of Ethiopian pepper germplasms collected from diverse agro-ecologies are C. annuum, whose mature fruit is an integral ingredient of the local spice mixture called berbere, used to season many Ethiopian dishes. The green fruit of C. annuum is also a very important component in the daily diet. The brown chilli pepper type (C. annuum) is especially highly valued for its high pungency for flavoring and coloring. Work by Berhanu et al. [47], Shimelis et al. [12] and Abrham et al. [13] had demonstrated the prevalence and variability of C. annuum. We also recognized the presence of some C. frutescens collections, known locally as "mitmita", growing on some part of the country. They are known locally for being highly pungent. Yayeh (1998) had previously described the existence of C. frutescens in Ethiopia [11]. In addition, the distribution of these two Capsicum species across Africa was described by Eshbaugh (1983), who summarized the evolutionary history of peppers and described how the genus was introduced to East Africa [48]. Similarly, Dagnoko et al. (2013) mentioned the importance of C. annuum and C. frutescens in West African countries [49].
Although morphological traits are important in the study of genetic diversity, because of their mostly polygenic nature and their dependence on various environmental factors, they may not always reflect real genetic variation [50]. Owing to their ability to recognize specific DNA sequences in the closely related genotypes, SNP markers have been used successfully to estimate genetic diversity among different plants. GBS is a preferred high-throughput genotyping method involving targeted complexity reduction and multiplex sequencing to produce  high-quality polymorphism data at a relatively low cost per sample [35]. The GBS method uses restriction enzymes coupled with DNA-barcoded adapters and can simultaneously perform SNP discovery and genotyping with or without reference genome sequences. GBS have been applied to various approaches for plant breeding and plant genetic studies, including linkage maps [20,36,51], genome-wide association studies [20,28], genomic selection [52] and genomic diversity studies [7]. We performed GBS for genotyping 142 germplasms of Capsicum species. Two enzymes, PstI and MseI, were used to reduce genome complexity, consistent with previous studies by Han et al. [20]. Using the CM334 genomic reference, SNP calling generated 53,284 high-quality SNPs. Transitions (72.45%) were more frequent than transversions (27.55%). The percentages of each SNP type in our study were 36.08%, 36 [4], which also found a higher frequency of transitions than transversions.
Heritability values are helpful in predicting the expected progress to be achieved through the process of selection; high heritability coupled with high genetic advance (GA) is an indicator of a high proportion of additivity in the genetic variance, and consequently suggests that a high genetic gain can be expected from selection [31]. The heritability values of the traits assessed in this analysis fell into two categories-very high (98.1%) for fruit weight and moderately high for the remainder-as illustrated in previous studies [9,31]. High GA (>20%) with moderate heritability values were observed for pericarp thickness, fruit width, pedicel length and fruit length. This pattern is partly supported by the findings of Usman et al. [53]. High values of PCV and GCV values for some of fruit-related traits were also reported previously [12] and indicated the existence of substantial variability, ensuring ample scope for improvement of these traits through selection.
Our panel of germplasms exhibited a wide range of genetic diversity for different agro-morphological traits. Plant habits as a measure of plant architecture; growth parameters including plant height, stem thickness, internode length, number of side branches; and morphological traits such as number of flowers per axil, length and thickness of fruits, fruit weight and number of locules all showed variation among germplasms.
Results of neighbor-joining clustering with model-based STRUCTURE, phylogenetic and PCoA all similarly suggested that there are two genetically distinct subclasses among the Ethiopian Capsicum germplasms investigated in this study. The PCoA showed tight clustering within the first clade, composed of C. annuum, and the second clade, in which all C. frutescens are grouped. A previous diversity study of 39 cultivated Ethiopian C. annuum strains using AFLP distance estimation, however, showed four major clusters [3]. Although C. frutescens has been shown to have close affinity to C. annuum, they were grouped separately in this study [5]. Based on our data, the high diversity values in the two subpopulations suggests the existence of excessive genetic variation within them. The lower value of H o (observed heterozygosity) as compared to H e (expected heterozygosity) in the subpopulations indicated the presence of inbreeding in the majority of Ethiopian Capsicum germplasms. The average distance (H E ) between individuals in same clade value was lower in subpopulation C1 (0.05) than in subpopulation C2 (0.07), indicating that C1 contained less variation. Genetic differentiation (F ST ) value for subpopulations C1 and C2 were 0.713 and 0.85, respectively, predicting that the germplasms in the two clades have several genotype patterns. There was a significant correlation between some of the morphological traits, such as plant height and fruit width, as indicated by SNP-marker-based matrices (S7 Table). In summary, the model-based ancestry analysis, the phylogenetic tree and the PCoA strongly supported the possibility that the collection of Ethiopian Capsicum germplasms has two well-differentiated genetic populations and some admixtures.
In our research, additional information was provided by the GWAS analysis. As was shown earlier by Chaim et al. (2001) [54] and Han et al. (2016) [51], the highest numbers of SNP markers for various agronomic traits were detected on chromosome 3. From the total of 398 significant SNPs of selected traits (S8 Table), 10 SNPs from chromosome 8, 6 SNPs from chromosome 7 and 1 SNP from chromosome 3 were common for the traits of fruit weight and fruit length. The remaining major SNP marker (10) observed for fruit weight in our study was detected by Chaim et al. [54]. A recent report by Chunthawodtiporn et al. [55] demonstrated the distribution of fruit-trait QTL; the authors considered transverse and longitudinal section, fruit shape and blossom-end shape on all chromosomes except 4, 5 and 7. However, our study included more fruit-related traits, and our results suggest that significant SNP markers are found on all chromosomes. For leaf length, for which we identified various significant SNP markers on chromosomes 2, 3, 6, 7 and 9, co-localized QTL results were reported on chromosomes 6, 8, 9 and 11 by Han et al. [51] and on chromosomes 1, 2 and 3 by Chunthawodtiporn et al. [55]. We detected SNP markers for stem traits on seven of the same chromosomes reported by Han et al. [51].
GWAS identified 31 SNP markers for fruit weight in this study. On chromosome 3, the SNP markers were co-localized at two different locations with the QTL reports of Han et al. This study is the first detailed characterization of a large sample of the Ethiopian Capsicum germplasms, representing the six administrative regions covering all Capsicum-growing agroecologies in the country. Our morphological and molecular characterization provides insight into the genetic variability of Capsicum in Ethiopia.

Conclusions
The phenotypic and molecular characterization of Ethiopian Capsicum germplasms showed high variation that can be exploited for the further breeding. Two distinct subpopulations were identified from the 130 accessions used in this study based on SNP information from our GBS library. Model-based population structure, phylogenic study and PCoA revealed similar results. Probably because of the existence of a dominant informal seed-distribution system in the country, the majority of germplasms collected from different geographical areas were grouped into the same or different clades, without clear association with growing region. However, clade 1, which contained the majority of germplasms, consisted mainly of C. annuum and C. baccatum. Clade 2 was composed mainly of C. frutescens accessions collected from different growing regions of the SNNPs region. GWAS analysis helped us to further identify significant markers associated with important morphological traits, some of which were colocalized with reported QTLs. Besides providing additional information to identify candidate genes for the traits considered, this study reconfirms the validity of using GBS and downstream analysis for marker-assisted selection in Capsicum.
Supporting information S1