Genetic diversity and population structure analysis in a large collection of Vicia amoena in China with newly developed SSR markers

Vicia amoena is a high-nutritional quality forage similar to alfalfa. However, studies on the genetic background of V. amoena are scarce. In the present study, the genetic variation of 24 V. amoena populations was assessed with newly developed simple sequence repeat (SSR) markers. A total of 8799 SSRs were identified in the V. amoena genomic-enriched sequences, and the most abundant repeat number was four. A total of 569 sampled individuals were assayed to evaluate the genetic diversity of the V. amoena populations based on 21 polymorphic SSR primers. The polymorphism information content (PIC) ranged from 0.896 to 0.968, with an average of 0.931, which indicated that the markers were highly informative. Based on analysis of molecular variance, 88% of the variance occurred within populations, and the remaining 12% of the variance occurred among populations. The high degree of gene flow (Nm= 4.958) also showed slight differentiation among the V. amoena populations. The V. amoena populations were mainly clustered by steppe and mountain habitats based on principal coordinate analysis (PCoA) and STRUCTURE analysis. This indicated that the elevation and special habitat of geographical origins may be important factors affecting the clustered pattern of V. amoena populations. Neighbour-joining (NJ) analysis did not separate the populations well by geographical origin, which indicated that the genetic structure of V. amoena was complex and needs further study. Overall, our results showed that the newly developed SSR markers could benefit the V. amoena research community by providing genetic background information to help establish a foundation for breeding improvement and germplasm resource conservation.


Background
Vicia amoena is an herbaceous, allotetraploid (2n=24), perennial legume species native to Eastern Asia (Siberia, Mongolia, China, Japan, and Korea) that is especially widely dispersed in northern China [1,2]. It has high nutritional quality, strong abiotic stress tolerance, and wide adaptability. The protein content and the amino acid content of V. amoena are comparable to those of alfalfa (Medicago sativa) [3]. Moreover, V. amoena is also used as a traditional Chinese medicinal herb to treat oedema, rheumatoid arthritis and contracture [4]. However, genetic research on this important forage legume is scarce, with most researchers instead focusing on its chemical components. Unravelling the genetic diversity and population structure of V. amoena is very important for understanding its genetic background, which is a prerequisite for future genetic research, breeding programme development and genetic resource conservation.
Microsatellites or simple sequence repeat (SSR) markers are a powerful molecular method for quantifying genetic variation in plants due to their high Open Access *Correspondence: grasschina@126.com 1 College of Grassland Science and Technology, China Agricultural University, Beijing 100193, China Full list of author information is available at the end of the article polymorphism [5]. SSR markers are characterized by repeated sequences comprising mono-, di-, tri-, tetra-, penta-or hexa-nucleotide units that are characterized by tandem repeats (1-10 nucleotide motifs) that exhibit locus-specific codominance and high heterozygosity, are distributed throughout the genome, and are easier to detect than other molecular markers [6]. Microsatellite markers have been successfully used in the assessment of many plants, e.g., Vicia faba [7,8], Campomanesia adamantium [9], Populus deltoides [10], Olea europaea [11], and Cunninghamia lanceolata [12].
Overall, SSRs are one of the most informative molecular markers for plant genetic research, but the isolation of SSR markers traditionally based on probe hybridization is an experimentally demanding, labour-intensive, and economically costly process [13]. Advancements in sequencing and bioinformatic analysis techniques have provided good opportunities for generating new SSR markers. For example, next-generation sequencing (NGS) technology is a powerful tool that can be used for fast and cost-effective SSR discovery [14,15]. To date, a large number of SSR markers have been developed by high-throughput sequencing in many plants, such as Medicago sativa [16], Vicia sativa [15], Elymus sibiricus [17], Onobrychis viciifolia [18], Angelica gigas [19], Lentinula edodes [20], and Spondias tuberosa [21].
In the present study, we developed SSR markers using the HiSeq 4000 PE150 sequencing platform. We then used 21 polymorphic pairs to analyse the genetic diversity and population structure of 24 V. amoena populations (569 total individuals) in China, which may support studies on molecular diversity and breeding programmes. Our goals are (1) to assess the validity of these newly developed SSR markers and (2) to obtain an accurate representation of the genetic diversity and population structure of V. amoena.

Plant materials and DNA isolation
A total of 569 individuals from 24 sites throughout the natural distribution of V. amoena in China were collected in the present study (Table 1). Of these individuals, 281 individuals from 13 populations were collected in the field. The other 288 individuals from 11 sites were obtained from seeds provided by the National Herbage Germplasm Conservation Centre of China (Beijing, China). Genomic DNA was extracted from fresh or silica gel-dried leaf tissues using a Plant Genomic DNA Extraction Kit (Tiangen, Beijing, China) according to the manufacturer's protocol.

SSR marker detection, identification, and primer design
An Illumina paired-end library was constructed by the NEBNext ® Ultra ™ II DNA Library Preparation Kit (New England Biolabs (Beijing) Ltd., China) and sequenced on the Illumina HiSeq 4000 PE150 sequencing platform. Approximately 17.5 Gb of raw data was generated, and the raw sequence reads were filtered for primer/adaptor sequences and low-quality reads with the NGS QC Tool Kit [22]. Sequencing reads were assembled using SPAdes 3.6.1 software [23] with the parameter Kmer=95, and 198,659 contigs were finally obtained.
MISA software [24,25] was used to identify unique reads containing microsatellite repeats. The search was performed for a minimum repeat number of 5, 4, 3, 3 and 3 for di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively. Primers were designed on the basis of flanking sequences of SSR microsatellite loci by using Primer 3. The parameters of primer design were set as follows: the primer size was between 18 and 25 bp with an optimal size of 22 bp, the annealing temperature was between 55 and 65 °C with the optimal temperature of 60 °C, the PCR product size was between 80 and 300 bp, and default values were selected for other settings.

M13-SSR PCR amplification
Twenty-one SSRs were selected through a preliminary experiment, and this number of markers was suitable for evaluating plant genetic diversity [11,26,27]. Twenty-one primer pairs ( Table 2) that successfully amplified fragments in the 569 individuals were further characterized for polymorphisms using the M13-SSR PCR protocol. There were three primers in the M13-SSR PCR system: a forward primer, a reverse primer with an M13-tail (5'-CAC GAC GTT GTA AAA CGA C-3') at the 5' end, and a fluorescently labelled M13 universal primer. The first two primers were synthesized by Sangon Biotech (Shanghai, China) Co., Ltd., and the third primer was synthesized by Thermo Fisher Scientific (Shanghai, China). The four fluorescently labelled primers were FAM, NED, VIC, and ROX.

Data analysis
The number of alleles (Na), the number of effective alleles (Ne), Shannon's information index (I), the observed heterozygosity (Ho), the expected heterozygosity (He), and the percentage of polymorphic loci (PPL) were determined to evaluate the genetic diversity of the SSRs and V. amoena populations. The genetic differentiation index (Fst) and genetic distance were calculated and principal coordinate analysis (PCoA) and analysis of molecular variance (AMOVA) were performed by GenAlEx 6.5 [28]. A NJ tree was constructed using MEGA X software [29]. Population genetic structure was determined using the model-based program in STRU CTU RE 2.3.4 software with a Bayesian approach [30,31]. The most likely number of populations (K) was identified among 2-24, and 10 interactions were performed for each value of K. The length of burn-in Markov chain Monte Carlo (MCMC) replications was set to 500,000, followed by 100,000 MCMC replications in each run. The optimal K capturing the major structure in the V. amoena data was determined using Structure Harvester (http:// taylo r0. biolo gy. ucla. edu/ struc tureH arves ter/) [32,33]. All tetraploid genotype data were converted into binary data using the POLY-SAT v1.2 package in R [34]. Polymorphic information content (PIC) was calculated using the formula PIC = 1-∑P i 2 , where P i is the frequency of the i-th allele [35].

The development and polymorphism of SSR markers
Genetic research on V. amoena has developed slowly due to a lack of sufficient genetic information and effective molecular marker systems. SSRs are one of the most important marker systems for plant genetic studies with genetic diversity evaluation, marker-assisted selection (MAS) breeding, quantitative trait locus (QTL) mapping, and variety identification and are extensively distributed throughout eukaryotic genomes [36,37]. However, traditional SSR development methods are labour intensive [13]. At present, SSR markers developed by high-throughput sequencing are reliable and effective [19,[38][39][40][41]. Genomic SSRs have not been developed thus far in V. amoena, and a new set of highly polymorphic SSR markers was successfully developed in the present study. A total of 8799 SSRs were developed in V. amoena at the genome-wide scale, which was far greater than the 1071 EST-SSRs developed by transcriptome sequencing in V. sativa [15]. Our work provides a powerful tool for genetic research on V. amoena in future breeding programmes and resource conservation. Among the SSR markers, trinucleotide repeats were the most abundant (44.07%) type, similar to the relative proportions of EST-SSR motif types observed in V. sativa [15] and Medicago sativa [16].
The results indicated that the trinucleotide SSRs in the V. amoena genome are mainly located in exon regions. The frequent distribution of trinucleotide repeats in coding regions indicates the effects of selection and evolution [41]. The 21 SSR markers used in this study offered an informative and applicable approach for the evaluation of genetic relationships among the V. amoena populations. The genetic diversity parameter values indicated the high polymorphism of the 21 SSR markers. The observed heterozygosity (Ho) and expected heterozygosity (He) values also revealed a high degree of genetic variability among the V. amoena populations [11]. The values of PIC, Ho and He were all higher than those of the EST-SSRs reported in V. sativa [15]. This could be related to the different methods of SSR marker development and the different genetic backgrounds of various plant species.  (Table S2). b STRU CTU RE output with K=10 (Fig. S1) showing the population structure among 569 individuals; vertical lines represent individuals.

Genetic differentiation and genetic structure of V. amoena populations
In the present study, a high level of genetic diversity (I=0.930) was detected among the V. amoena populations by the newly developed SSR markers. This genetic diversity was more evident than that detected by SRAP and ISSR markers in a previous report (I=0.397) [2]. Two reasons for this difference are that SSR markers are more effective than the other two types of markers [42] and more natural populations were examined in the present study. Among the populations, those from Qinghai Province showed a lower level of genetic diversity, which may be due to their unique geographical location on the Qinghai-Tibet Plateau. The populations from tall mountain areas with high forest coverage at approximately 40°N had a higher level of genetic diversity. Genetic variation within the populations (88%) was higher than that among the populations (12%) in this study. The results were consistent with the characteristics of outcrossing species [43,44], which can be attributed to allogamous reproductive behaviour. The variation in V. amoena mainly comes from intrapopulation variation, confirming that V. amoena is a crosspollinating plant.
The 24 V. amoena populations could be separated into three clusters via PCoA. The populations were mainly separated by habitat, i.e., mountain meadow, Leymus chinensis steppe, and undergrowth on mountains. The results indicated that the elevation of the geographical origin may be an important factor explaining the clustered pattern of V. amoena and that special habitat is another important factor. Similar results were found in the STRU CTU RE analysis. The inferred subpopulations were broadly separated based on the best K value (K=10). The populations were mainly clustered among Leymus chinensis steppe, mountain areas with high forest coverage, and the Qinghai-Tibet Plateau. The results showed that the clusters of V. amoena were impacted by different landforms and the special topography of the Qinghai-Tibet Plateau. It would be worth exploring how the special topography affects the genetic differentiation of V. amoena in the future.
Additionally, the NJ analysis of V. amoena based on the entire SSR dataset revealed five major groups and showed an interesting pattern. The individuals from the populations on mountains were clustered with the populations from the Qinghai-Tibet Plateau. The other populations from the mountains and Leymus chinensis steppe were gathered in three clusters. The clustered pattern in the NJ analysis did not show clear boundaries among the different habitats and elevations. The high gene flow (Nm= 4.958) also weakened the differentiation among the V. amoena populations. The results indicated that the genetic structure of V. amoena populations was complex and affected by many factors, which needs further analysis. This might be due to the special climatic conditions, habitats, and geomorphic conditions [2].
In conclusion, our results confirmed that the V. amoena populations in China contained a high level of genetic diversity. There is a tendency for the genetic structure of the populations to be correlated with geographical origin and comprehensive environmental factors. Our findings and the SSRs newly developed in the present study provide a strong tool for breeding improvement and germplasm resource conservation in V. amoena.
Additional file 1: Table S1. The repeats number of different SSR motifs. Table S2. The proportion of each population in the genetic structure analysis. Figure S1. The best K-value of the genetic structure based on STRU CTU RE analysis.