Genetic diversity and population structure analysis in a large collection of white clover (Trifolium repens L.) germplasm worldwide

White clover is an important temperate legume forage with high nutrition. In the present study, 448 worldwide accessions were evaluated for the genetic variation and polymorphisms using 22 simple sequence repeat (SSR) markers. All the markers were highly informative, a total of 341 scored bands were amplified, out of which 337 (98.83%) were polymorphic. The PIC values ranged from 0.89 to 0.97 with an average of 0.95. For the AMOVA analysis, 98% of the variance was due to differences within the population and the remaining 2% was due to differences among populations. The white clover accessions were divided into different groups or subgroups based on PCoA, UPGMA, and STRUCTURE analyses. The existence of genetic differentiation between the originally natural and introduced areas according to the PCoA analysis of the global white clover accessions. There was a weak correlation between genetic relationships and geographic distribution according to UPGMA and STRUCTURE analyses. The results of the present study will provide the foundation for future breeding programs, genetic improvement, core germplasm collection establishment for white clover.


INTRODUCTION
White clover (Trifolium repens L.) is a cool-season, allotetraploid (2n=4x=32) perennial legume species (Cogan et al., 2006;Isobe et al., 2012). It can grow well in a wide range of soil and environmental conditions with proper management, and it has extended its range globally by wild and cultivated distribution from its natural range (Europe, Western Asia, and North Africa) (Griffiths et al., 2019). It is an important companion species in perennial grass pastures in temperate latitudes for its high nutritional quality and strong nitrogen fixation ability (Barrett et al., 2004;Brink et al., 1999;George et al., 2006;Randazzo, Rosso & Pagano, 2013;Zhang et al., 2010). White clover is an obligate outcrossing species and shows strong gametophytic self-incompatibility, which leads to high genetic heterozygosity in populations (Aasmo Finne, Rognli & Schjelderup, 2000;Zhang et al., 2010).
Evaluation of genetic variation is essential for plant genetic resources conservation, selecting the genetically divergent parents for practice breeding and preventing genetic bases erosion of breeding populations (Dolanská & Čurn, 2004;Kölliker, Jones & Forster, 2001). Initial breeding efforts of white clover began in the 1930s and substantial genetic improvement has been achieved over the last 60-70 years (Zhang et al., 2010). As an outbreeding species, genetic improvement of white clover is always depending on mass or recurrent selection and based on polycross among multiple parental genotypes (George et al., 2006). White clover shows rich genetic diversity on many traits, such as leaf marks, cyanogenesis, herbage yield and leaf size. Although white clover is primarily propagated through clonal growth, high levels of genetic variation also could be detected in the populations of white clover (Gustine et al., 2002).
SSR markers have also been applicated in white clover successfully, such as develop SSR markers for white clover (Kölliker et al., 2001) and used to evaluate genetic diversity (George et al., 2006;Randazzo, Rosso & Pagano, 2013;Zhang et al., 2010) and construct genetic linkage maps (Barrett et al., 2004;Griffiths et al., 2013;Isobe et al., 2012;Jones et al., 2003;Zhang et al., 2008;Zhang, Sledge & Bouton, 2007). The dendrogram employing SSR data of ten white clover germplasm collections from China showed the closest agreement with geographical origins (Zhang et al., 2010). Cultivars from New Zealand were more distant from the other cultivars based on SSR data (Randazzo, Rosso & Pagano, 2013). DNA fingerprints have been constructed for 10 commercial white clover cultivars by SSR markers (Ma et al., 2020), which showed that SSR markers are of great significance for the identification of special materials and could provide a basis for future studies of the genetic background. The genetic variation of white clover has also been evaluated by other technological means. The cluster analysis of 52 cultivars and accessions based on AFLP data only partially reflected their geographic origin (Kölliker, Jones & Forster, 2001). Eight white clover populations derived from different climates and geographic regions of North American showed high genetic similarities which indicated they have a common European origin (Gustine et al., 2002). As high informative molecular markers, SSRs can accelerate breeding programs greatly (Kölliker et al., 2001).
In the present study, 22 microsatellite markers (Griffiths et al., 2013;Griffiths et al., 2019) were used to evaluate the genetic variation among 448 white clover accessions collected from globally diverse origins. We analyzed the genetic diversity among accessions in terms of geographical origin. Our results have important implications for future breeding, germplasm improvement, and core germplasm collection in white clover.

DNA extraction and SSRs-PCR
The total DNA was extracted from fresh leaf samples using a DNA Extraction kit (Tiangen Biotech Co., Beijing, China). SSRs primers developed in previous studies (Griffiths et al., 2013) were used in the present study. In all, 22 primers (supplied by Sangon Biotech Co., Shanghai, China) were used in the analysis (Table S2). SSRs-PCR amplification reactions were carried out in 20 µL volumes, containing 1 µL genomic DNA (50 ng), 12.5 µL 2× Taq PCR mix (Tiangen Biotech Co., Beijing, China), 2 µL primers (1 µL forward primer and 1 µL reverse primer) and ddH 2 O to adjust the volume. The PCR program was carried out as follows: 94 • C for 5 min, followed by 35 cycles of 94 • C for 1 min, 55 • C for 30s, and 72 • C for 40s, and a final extension at 72 • C for 10 min. The PCR products were examined using 8.0% polyacrylamide gels electrophoresis under 400 volts for 2 h and were visualized using silver staining.

Data scoring and statistical analysis
The amplification bands were scored for the presence (1) or absence (0) and a binary matrix was formed for SSR markers. The total number of bands (TNB), number of polymorphic bands (NPB) and percentage of polymorphic bands (PPB) were calculated. Polymorphic information content (PIC) was calculated using the formula PIC = 1 − P i 2 , and the P i is the frequency of the i-th allele (Powell et al., 1996). The number of polymorphic loci (NPL), the percentage of polymorphic loci (PPL), the observed number of alleles (Na), the effective number of alleles (Ne), Nei's (1973) gene diversity (h), and Shannon's information index (I ) were calculated by GenAlEx 6.5 (Peakall & Smouse, 2012) to evaluate the genetic diversity within accessions and populations. Genetic distance, the principal coordinate analysis (PCoA) and the analysis of molecular variance (AMOVA) were conducted using GenAlEx 6.5 (Peakall & Smouse, 2012). The unweighted pair-group method with arithmetic means (UPGMA) cluster analysis was performed based on Nei's unbiased genetic distance matrix with MEGA X (Kumar et al., 2018). Population genetic structure was determined using the model-based program in the STRUCTURE 2.3.4 software with a Bayesian approach (Falush, Stephens & Pritchard, 2003;Falush, Stephens & Pritchard, 2007). The number of the most likelihood populations (K) was tested for 1-10 and 10 interactions were done for each K. The 500,000 initial burn-in replications were followed by 100,000 Markov Chain Monte Carlo (MCMC) replications. The optimal K capturing the major structure in the white clover data was determined using Structure Harvester (http://taylor0.biology.ucla.edu/structureHarvester/) (Earl & vonHoldt, 2012;Evanno, Regnaut & Goudet, 2005).

The polymorphism of SSR markers
In this study, a total of 341 scored bands were amplified using 22 SSR primers across 448 accessions, out of which 337 (98.83%) were polymorphic ( Table 1). The number of polymorphic bands for each primer combination varied from 7 (gtrs1113) to 25 (gtrs749), with an average of 15.30 bands. All the primers had a high PIC value and identified a high level of polymorphism. The percentage of polymorphic bands revealed different levels of polymorphisms ranging from 91.67% to 100%. And the PIC values ranged from 0.89 to 0.97 with an average of 0.95. The primers also showed high Nei's genetic diversity (h) and Shannon's Information index (I ). The h is ranged from 0.198 to 0.345 with an average of 0.280, and the I is ranged from 0.339 to 0.520 with an average of 0.437 (Table 1).

Genetic diversity analysis
The genetic diversity was analyzed for the natural and introduced groups ( Table 2). The percentage of polymorphic loci of the natural group (98.24%) is higher than the introduced (96.77%). The number of polymorphic loci values is 335 and 330 respectively. The observed number of alleles of the natural group (1.982) is also higher than the introduced (1.956), as well as the effective number of alleles, which is 1.450 and 1.433 respectively. The Nei's gene diversity values for the natural group are 0.280 and 0.273 for the introduced group. Correspondingly, the higher Shannon's information index was recorded for the natural group (0.437) and the lower for the introduced group (0.427).
The genetic diversity index also is calculated in the subgroups (Table 3). The NPL values for subgroups ranged from 316 (Asian from Natural) to 335 (Australia from Introduced). The highest PPL was 98.24% was recorded in Australia from the introduced group, while the lowest was 92.67% for Asian from the natural group. The Na ranged from 1.891 in subgroup Asian from the natural group and 1.979 from subgroup Australia from the introduced group. The Ne varies from 1.419 to 1.449, which was recorded in the Mediterranean and Europe from the natural group. The Nei's gene diversity values for  Analysis of molecular variance (AMOVA) was implemented to evaluate variance components among groups and subgroups (Table 3), which showed highly significant differences (P < 0.05). Of the total accessions, 98% of the variance was due to differences among the accessions within the groups and the remaining 2% was due to differences between the groups. Of the natural group, 97% of the variance was due to differences among the accessions within the subgroups and the remaining 3% was due to differences among the subgroups. It showed the same result in the AMOVA analysis of the introduced group, 97% differences showed among the accessions and 3% showed among the subgroups.

Cluster and population structure analysis
The relationship among the accessions from the different groups and subgroups based on genetic distance was further determined by UPGMA cluster analysis, PCoA analysis, and genetic structure analysis. Clear population differentiation is absent in UPGMA using scored SSR markers in this study, and each group contained accessions of various sources in population structure analysis. According to the UPGMA dendrogram ( Fig. 2A), all the accessions from the natural and introduced range could be classified into four clusters ( Fig. 2A). The accessions from the natural group and the introduced group could be divided into different subclades in cluster I and cluster III. In cluster I, 24 accessions from the introduced range clustered into one subclade and all belong to subgroup Australia. While the accessions from the natural range come from the subgroup European. In the cluster III, 12 accessions from the natural range clustered into one subclade and come from subgroup European. Meanwhile, the accessions from the introduced range mainly come from Asia and Australia. In cluster II and cluster IV, the accessions from the natural and introduced range were closely related. Further, the subclades span the extremes of the dendrogram were the accessions from the natural range. The UPGMA dendrogram of all the accessions showed that the Australia and Asia accessions (introduced) had a closer genetic relationship with the European accessions (natural) (Cluster I & III, Fig. 2A). And the American accessions (introduced) may be closed to the Mediterranean accessions (Cluster II & IV, Fig. 2A). The genetic distance (Table S3) between two Asia accessions (Tr_058 and Tr_059) was the least, while the largest genetic distance was showed between Europe (Tr_252) and Australia (Tr_318) accessions. According to the PCoA analysis, all the accessions could be classified as natural and introduced populations (Fig. 2B). The PCoA of SSR data grouped the accessions as the natural and introduced range (Fig. 2B). Structure software was run for K = 2-10 based on the distribution of the SSR data among the 448 accessions. Based on maximum likelihood and delta K ( K ) values, the number of optimum groups was four ( The UPGMA dendrogram of the natural accessions showed that the accessions from Europe were distributed throughout the dendrogram (Fig. 3A). The accessions of Asia and Russia (European) mainly clustered at one end, and most of the Mediterranean mainly clustered at the other end. The Mediterranean accessions had further genetic distance with the accessions from Asia (Fig. 3A). The PCoA analysis showed a clustering pattern synonymous with the UPGMA dendrogram (Fig. 3B). Structure software was run for K = 2-10 based on the distribution of the SSR data among the 255 accessions. Based on maximum likelihood and delta K ( K ) values, the number of optimum groups was three ( Fig. 3C and Fig. S2). Among them, Group N1 contained 66 accessions (43 accessions   Tr 152  Tr 162  Tr 164  Tr 154  Tr 156  Tr 158  Tr 160  Tr 114  Tr 116  Tr 118  Tr 120  Tr 110  Tr 112  Tr 122  Tr 124  Tr 126  Tr 128  Tr 130  Tr 132  Tr 125  Tr 127  Tr 129  Tr 131  Tr 133  Tr 135  Tr 137  Tr 139  Tr 141  Tr 143  Tr 145  Tr 147  Tr 149  Tr 151  Tr 153  Tr 155  Tr 157  Tr 159  Tr 069  Tr 071  Tr  070   Tr  072   Tr  073   Tr  075   Tr  074   Tr  076   Tr  005   Tr  007   Tr  006   Tr  008   Tr  009   Tr  011   Tr  010 Tr  012 Tr  001 Tr  003 Tr  002   Tr  004   Tr  013 Tr  015 Tr  014 Tr  016 Tr  017 Tr  019 Tr  018 Tr  020 Tr  021 Tr  023 Tr  022 Tr  024 Tr  033 Tr  035 Tr  034 Tr  036 Tr  025 Tr  027 Tr  026   Tr  028   Tr  029   Tr  031  Tr 030  Tr 032  Tr 061 Tr 063  Tr 062  Tr 064  Tr 065  Tr 067  Tr 066  Tr 068  Tr 057  Tr 059  Tr 058  Tr 060  Tr 037  Tr 039  Tr 038  Tr 040  Tr 041  Tr 043  Tr 042  Tr 044  Tr 045  Tr 047  Tr 046 Tr 048  Tr  053  Tr 311  Tr 313  Tr 315  Tr 433  Tr 435 Tr 437  Tr 439  Tr 441  Tr 443  Tr 445  Tr 447  Tr 429  Tr 431  Tr 373  Tr 375  Tr 377  Tr 379  Tr 381  Tr 383  Tr 385  Tr 387  Tr 389  Tr 391  Tr 393  Tr 395 Tr 353  Tr   355   Tr   357   Tr  359 Tr  349 Tr  351 Tr  369 Tr  371 Tr  361 Tr  363 Tr  365 Tr  367 Tr  317 Tr  319 Tr  321 Tr  323 Tr  325 Tr  327 Tr  329 Tr  331 Tr  333 Tr  335 Tr  337 Tr  339 Tr  341 Tr  343   Tr   345   Tr   347   Tr   105   Tr   107   Tr 109  Tr 111  Tr 113  Tr 115  Tr 117  Tr 119  Tr 121  Tr 123  Tr 081  Tr 083  Tr 085  Tr 087  Tr 082  Tr 084  Tr 077  Tr 079  Tr 078  Tr 080  Tr 102  Tr 104  Tr 106  Tr 108  Tr 097  Tr 099  Tr 101  Tr 103  Tr 089  Tr 091  Tr 093  Tr 095  Tr 086  Tr 088  Tr 090  Tr 092  Tr 094  Tr 096  Tr 098  Tr 100  Tr 418  Tr 420  Tr 422  Tr 424  Tr 426  Tr 428  Tr 430  Tr 432  Tr 410  Tr 412  Tr 414  Tr 416  Tr 394  Tr 396  Tr 398  Tr 400  Tr 402  Tr 404  Tr 406  Tr 408  Tr 378  Tr 380  Tr 382  Tr 384  Tr 386  Tr 388  Tr 390  Tr 392  Tr 370  Tr 372  Tr 374  Tr 376  Tr 358  Tr   . The genetic structure revealed most accessions with admixture in each group while accessions in group 3 showed less admixture, which mostly comes from Europe. For the accessions from the introduced range, the subgroup Asian accessions mainly clustered in one clade, which also clustered with several American and Australian accessions. Most of the American accessions also clustered within one clade. The Australia accessions were distributed all through the dendrogram (Fig. 4A). The PCoA analysis of the introduced accessions showed that the Asian accessions could separate from the American accessions. All the above two subgroup accessions were mixed with the Australia accessions (Fig. 4B). Structure software was run for K = 2-10 based on the distribution of the SSR data among the 193 accessions. Based on maximum likelihood and delta K ( K ) values, the number of optimum groups was two ( Fig. 4C and Fig. S3  from subgroup Asian). The genetic structure of the introduced accessions revealed less admixture than the natural accessions.

Marker polymorphism and genetic diversity analysis
Evaluation of genetic diversity for outbreeding forage species is important for breeding improvement (Dolanská & Čurn, 2004). White clover is a highly heterogeneous and outbreeding species (Cogan et al., 2006;Isobe et al., 2012), substantial genetic variation among the white clover accessions was observed as expected. In the present study, all the 22 SSR markers showed highly polymorphic. The mean PIC value (0.95) was higher than the values of the primers used in the study of Kölliker et al. (2001) and George et al. (2006), which were 0.68 and 0.66. It is even higher than other genus and species, based on SSRs data, such as genus Melilotus with 0.87 (Wu et al., 2016) and alfalfa with 0.608 (Wang et al., 2013). This may be on account of the SSR markers are more polymorphic as codominant markers (Griffiths et al., 2013;Wu et al., 2016). It also might result from the different environments (geographical origin) of the 448 accessions and a high percentage of outcrossing in the species. White clover has a high level of genetic heterogeneity within natural and synthetic populations (George et al., 2006;Williams, Baker & Williams, 1987). In this study, the  Manuscript to be reviewed genetic diversity of the natural population (h = 0.280, I = 0.437) was slightly more evident than that of the introduced population (h = 0.273, I = 0.427). The high-level genetic diversity partly because the two diploid progenitors of white clover come from very different environments (extreme coastal or alpine habitats) (Griffiths et al., 2019), and partly because of multiple introduction events of white clover. Among natural subpopulations, the European had the highest level of genetic diversity (h = 0.279, I = 0.434) which was due to the European region was the origin of white clover. The Australian subpopulation had a higher level of genetic diversity (h = 0.278, I = 0.434) than the other two introduced subpopulations. This suggested the Australia accessions may have more diverse sources, and multiple introductions from different regions resulted in high genetic diversity in Australia. Genetic variation between the populations (97%-98%) was higher than that within populations (2%-3%) in the present study. The result is consistent with the previous studies of white clover based on RAPD (73% within population) (Gustine & Huff, 1999), AFLP analysis (84% within cultivars) (Kölliker, Jones & Forster, 2001) and SSR (86.5% within cultivars) (George et al., 2006). Which also consistent with the other outcrossing species, such as perennial ryegrass (Bolaric et al., 2005;Van Treuren et al., 2005). The high intrapopulation variability was attributed to the allogamous reproductive behavior, and the variation of white clover mainly comes from the intrapopulation variation.

Population genetic structure of white clover germplasm resources
White clover is a successful allotetraploid example of allopolyploidy-facilitated niche expansion, which has facilitated global radiation of the previously confined specialist progenitor genomes (Griffiths et al., 2019). It is considered that the indigenous area consists of the whole European, North Africa (Morocco and Tunisia) and the western half of the Asiatic distribution area. Moreover, the species has invaded globally through the animal, human and spontaneous distribution (Daday, 1958). In our study, the existence of genetic differentiation between the originally natural and introduced areas according to the PCoA analysis of the global white clover accessions. It is similar to the results of Jahufer et al. (2003), the clustering of white clover cultivars also indicated a strong correlation with geographic origin based on EST-SSRs analysis. Cluster analysis of 52 white clover accessions based on the AFLP data also showed a partial association between cultivar groups and geographic origin (Kölliker, Jones & Forster, 2001).
In contrast, clear population differentiation with the geographic origin was absent in UPGMA and STRUCTURE analyses, in which no group exclusively included the accessions from a single region. The results were consistent with George et al. (2006), who found no obvious distinction among white clover accessions among the geographical origins. The weak correlation between genetic relationships and geographic distribution conforms with the reports in Eruca sativa (Golkar & Bakhtiari, 2020), Vicia faba (Ammar et al., 2015) and Camellia sinensis (Zhang et al., 2018). It may be attributed that there is no significant correlation between genetic distance and geographical distance (Golkar & Nourbakhsh, 2019). In the present study, the UPGMA dendrogram of all the accessions showed that the clusters have substantial overlap of different populations. Moreover, high values of the genetic mixture were also confirmed by STRUCTURE analysis. It is largely due to the outcrossing and self-incompatibility of plant species (Khan et al., 2009), human seed transplantation (Daday, 1958;Wang et al., 2009), different biological dispersal patterns and evolutionary forces (Chapman et al., 2010) and random dispersal in a region (Golkar & Mokhtari, 2018). The given genetic admixture of white clover may result from a complicated hybrid ancestry, and the high rate of outcrossing could result in genetic admixture from adjacent regions (Griffiths et al., 2019).
White clover spread by natural means to the largest part of the Asiatic mainland (Daday, 1958). It was supported by our results, which showed that the least genetic distance existed in Asia accessions. The level of genetic diversity was also the lowest among all the subpopulations. Moreover, white clover was carried to introduced Japan (Asia) from Dutch (Europe) in 1846. The Japanese accession also gathered with European accessions to a subclade in the present study. According to the references (Daday, 1958;Gustine et al., 2002), white clover was introduced into America and Australia from Europe. However, the largest genetic distance was showed between Europe and Australia accessions. It suggested that the introduced white clover adapted to new environments by forming genetic variation. The genetic diversity of the European subpopulations from the natural range were at a pretty high level in our results. The abundant genetic variation could provide an excellent genetic basis for practice breeding. Hence, the European collections, especially the coastal and the alpine area (Griffiths et al., 2019), could be recommended as alternative collections for core germplasm collections selection. The core collections should maintain the vast majority of germplasm diversity (Lv et al., 2020), and the optimal fraction of core collection for white clover needs to be further studied.
The white clover accessions in the present study were divided into different groups or subgroups based on PCoA, UPGMA and STRUCTURE analyses. It could be attributed to the different statistical principles (Gower, 1966;Lv et al., 2020;Pritchard et al., 2000). PCoA can provide a more valid classification based on the dissimilarity matrix of the original data, which is not strict with the Hardy-Weinberg equilibrium assumption. STRUCTURE assigns the accessions to subgroup probabilistically by a Bayesian clustering approach, and it is always used for the subdivision of natural out-crossing populations. And the accessions were clustered using UPGMA analysis is implemented based on genetic distance, which showed more detailed relationships among the accessions. Overall, these three methods could work together to provide a comprehensive understanding of the white clover population genetic structure.
In conclusion, the findings of the study confirmed that global white clover accessions contained a high level of genetic diversity. And the weak correlation between genetic relationships and geographic distribution of white clover accessions. Our result will provide molecular evidence for breeding improvement, germplasm resources conservation and core germplasm collection establishment for white clover.