Analysis of the genetic diversity and population structure of Salix psammophila based on phenotypic traits and simple sequence repeat markers

Salix psammophila (desert willow) is a shrub endemic to the Kubuqi Desert and the Mu Us Desert, China, that plays an important role in maintaining local ecosystems and can be used as a biomass feedstock for biofuels and bioenergy. However, the lack of information on phenotypic traits and molecular markers for this species limits the study of genetic diversity and population structure. In this study, nine phenotypic traits were analyzed to assess the morphological diversity and variation. The mean coefficient of variation of 17 populations ranged from 18.35% (branch angle (BA)) to 38.52% (leaf area (LA)). Unweighted pair-group method with arithmetic mean analysis of nine phenotypic traits of S. psammophila showed the same results, with the 17 populations clustering into five groups. We selected 491 genets of the 17 populations to analyze genetic diversity and population structure based on simple sequence repeat (SSR) markers. Analysis of molecular variance (AMOVA) revealed that most of the genetic variance (95%) was within populations, whereas only a small portion (5%) was among populations. Moreover, using the animal model with SSR-based relatedness estimated of S. psammophila, we found relatively moderate heritability values for phenotypic traits, suggesting that most of trait variation were caused by environmental or developmental variation. Principal coordinate and phylogenetic analyses based on SSR data revealed that populations P1, P2, P9, P16, and P17 were separated from the others. The results showed that the marginal populations located in the northeastern and southwestern had lower genetic diversity, which may be related to the direction of wind. These results provide a theoretical basis for germplasm management and genetic improvement of desert willow.


INTRODUCTION
Salix psammophila (desert willow) is a shrub mainly distributed in Northwest China and is endemic to the Kubuqi Desert and the Mu Us Desert. S. psammophila (Salix; Salicaceae) is counts, flow cytometry, and SSR analysis. Thus, SSR was ideal markers for studying genetic diversity and population structure of S. psammophila.
Animal-model analyses is a powerful approach to assess proportion of the additive genetic contribution to phenotypic trait variation and estimate heritability for the phenotypic traits (Klápště, Lstibůrek & El-Kassaby, 2014;Wilson et al., 2006). Animal model are linear mixed models that are based on pedigree or marker-inferred pairwise relatedness between individuals, and quantitative trait values of these individuals. It is widely used in wild animals (Husby, Visser & Kruuk, 2011;Réale et al., 2003), but rarely used in plants now  used an animal model with SSR-based relatedness to estimate in natural populations of S. herbacea.
Therefore, we determined the variance of phenotypic traits and genetic diversity based on SSR, and tried to compare and analyze the genetic diversity and population structure of S. psammophila based on phenotypic traits and SSR. To this end, the present study analyzed nine quantitative phenotypic traits in 491 genets of 17 S. psammophila populations to evaluate morphological variation and clustering of phenotypic traits. Generalized linear mixed models were used to examine the effect of sex on phenotypic traits. Animal-model analysis was used to assess the heritability of phenotypic traits. Moreover, 22 SSR markers were used to assess genetic diversity, neighbor-joining (NJ) phylogenetic analyses, principal coordinate analysis (PCoA) and population structure of S. psammophila. Further, those results would provide a basis for germplasm resource management as well as breeding programs.

Plant materials
Research materials were collected from the germplasm resource preservation library of S. psammophila established in Ordos Dalad, Inner Mongolia in 2008. More than 1,000 genets were collected 21 populations from different areas. S. psammophila genets were randomly sampled within a 100 Â 100 m 2 plot at each population. The distance between each genets were at least 50 m. About 50 genets were collected from each population by phenotypic traits. We collected genets by phenotypic traits, including PH, ground diameter (GD), tree shape, branch color, and so on. According to phenotypic traits investigation, 3-year-old stem of branches with large and strong shrubs, no diseases and pests genets were selected to collect in the germplasm resource preservation library of S. psammophila.
A total of 528 genets from 17 populations (P1-P17) were obtained from the germplasm collection to analyze population genetic structure and the association between SSR markers and specific phenotypic traits. There were 15 genets missing and recorded inaccurately in the phenotypic investigation, 22 genets of the same clones were detected by SSR markers; therefore, 491 genets were ultimately analyzed. The origins and locations of the 17 populations are shown in Table 1 and Fig. 1.

Determination of phenotypic traits
Phenotypic traits of 22-32 genets in each population were evaluated in 2013, including nine quantitative phenotypic traits. Sex was recorded of each genets. PH and GD were measured with meter stick, and BA was measured with a goniometer (Fig. S1). Each genet was measured five different branches for BA. In addition, 8-13 leaves from each genet that grew well and had no signs of disease were photographed, and MapGIS 6.7 software (China University of Geosciences, Beijing, China) was used to measure and record the parameters of leaf length (LL), leaf width (LW), LA, leaf petiole (LP), and leaf perimeter (LPE). PH was measured as the distance from the ground level to the tip of the plant; GD was determined as the diameter of the plant at a height of 30 cm from the ground; BA was measured as shown in Fig. S1.

DNA isolation and SSR analysis
Fresh and young leaves (0.2 g) were selected from each genet and total genomic DNA was isolated using a Plant Genomic DNA kit (TIANGEN, Beijing, China). The quality of DNA was verified by 1.2% agarose gel electrophoresis and a NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The DNA was stored at -20 C until used for PCR.

Statistical analysis
Differences in the nine quantitative phenotypic traits among 17 populations were evaluated by one-way analysis of variance using Data Processing System software (Tang & Zhang, 2012). Duncan's multiple-range test was applied to compare the different populations. Quantitative traits were analyzed according to mean Euclidean distance based on genetic dissimilarity using DARwin 5 software (http://darwin.cirad.fr/), and the results were used to construct an UPGMA hierarchical clustering dendrogram of the 491 S. psammophila genets. Moreover, generalized linear mixed models were used to estimate the effect of sex on phenotypic traits . All statistical analyses were carried out in packages lme4 (Bates et al., 2014) and lmerTest (Kuznetsova, Brockhoff & Christensen, 2015) using R 3.2.3. Narrow-sense heritability (h 2 ) of phenotypic traits were estimated with the multivariate restricted maximum likelihood (REML) animal model (Kruuk, 2004) using ASReml-R v.3.0 (Butler et al., 2007). The method of relatedness estimation using SSR genotypes and estimation of genetic parameters referred to study on S. herbacea . The microsatellite DNA allele counting-peak ratios method (Esselink, Nybom & Vosman, 2004) was used to read tetraploid genotypes based on calculated ratios between peak areas. All tetraploid genotypes were read twice according to marker motifs. Although reading peaks is a little complicated, estimating polyploid genotypes using SSRs can clearly differentiate within genus by high throughput genotyping with multiplexed PCR (Guo et al., 2006;Pyne et al., 2018), and is also an auxiliary tool for determining ploidy (Wu et al., 2018). Representative two-microsatellite loci are shown in Fig. S2. The weak peaks and smeared peak were excluded from the final data analysis. AUTOTET (Thrall & Young, 2000) was used to calculate allelic richness, allelic richness within individuals, observed heterozygosity (H o ), expected heterozygosity (H e ), and fixation coefficient (F). PIC_CALC v.0.6 (Nagy et al., 2012) was used to calculate the polymorphism information content (PIC) of genets. Genetic distances (Nei, Tajima & Tateno, 1983) between genets were calculated using Populations version 1.2.31 software (http://bioinformatics.org/populations/). The distance method was using Nei, Tajima & Tateno (1983). GenALEX6.5 (Peakall & Smouse, 2012) was used for PCoA using the genetic distance matrix obtained from the Populations v. 1.2.31 software. MEGA3.1 was used to construct a NJ phylogenetic tree of the 491 genets and 17 populations based on the genetic distance matrix. The genetic structure of the 491 genets was analyzed using STRUCTURE 2.3.4 software (Porras et al., 2013), which is based on the Bayesian model. The length of the burn-in period and the number of Markov Chain Monte Carlo replications after this period was assigned as 10,000 with an admixture. The structure was run 10 times at each K by setting K from 2 to 14.

Clustering of phenotypic traits
A cluster analysis of seven phenotypic traits was carried out after removing PH and GD. PH and GD cannot be scored reliably because of stumping (3 years as a cycle) for shrubs, which are tufted. The seven phenotypic traits of the 491 genets were  standardized and mean Euclidean distances obtained by the UPGMA method were used to construct a dendrogram (Fig. 2). The cluster analysis showed that genets from P17 were obviously isolated in the dendrogram, indicating that they possess unique characteristics. Standardized average values of the seven phenotypic traits in different populations were used to generate a heat map of the 17 populations, which formed five clusters (Fig. 3). Cluster 1 was P17; Cluster 2 included P3, P15, and P16; Cluster 3 included P9, which separated into a single cluster; Cluster 4 consisted of P1, P2, and P11; and the remaining populations formed Cluster 5. The heat map showed that P15, P16, and P17 had the highest values for leaf phenotypic traits, whereas P1, P2, and P11 had the smallest. Populations in Clusters 1, 2, 3, and 4 were mostly from the southwestern and northeastern edges of the distribution area, suggesting that populations located in the periphery of the distribution area have phenotypic traits that are distinct from those of central populations.

Genetic diversity analysis
A total of 22 SSR primers were used to analyze the 491 genets of 17 S. psammophila populations from Northern China. The mean sample size of each population was 27.85. The mean number of alleles per locus (Na) was 7.36 (range: 5.36-8.32). The average number of different alleles per individual and locus was 2.33 (range: 2.14-2.43). The mean genotypic richness (mean number of four-allele genotypes at a locus) was 14.5 (range: 6.68-18.00). Mean He (0.65) was higher than mean H o (0.6), and mean PIC was 0.6 (Table 4). AMOVA revealed that most of genetic variance (95%) was within populations, with only a small portion (5%) occurring among populations (Table 5). NJ phylogenetic analysis, PCoA, and population structure The mean Nei's genetic distance of the 491 genets of S. psammophila was 0.387 (range: 0.002-0.641). A NJ phylogenetic tree was constructed based on the calculated genetic distances (Fig. 4). The cluster analysis showed that individuals from P1, P2, P16, and P17 were distributed more centrally than other populations in the dendrogram, confirming the findings from the cluster analysis based on phenotypic traits.  The population structure and NJ phylogenetic analyses indicated that P1 and P2 (from the northeastern part of the distribution area) and P9, P16, and P17 (from the southwestern part of the distribution area) were separated from other populations (Fig. 5). On the other hand, the results for P3, P15, and P11 based on SSR data were not consistent with the phenotypic trait analysis. Principal coordinate analysis based on SSR data revealed a large genetic diversity among the 491 genets. PCoA of first three axes explained 13.08% of the total variation (5.16%, 4.19%, and 3.74%, respectively), (Fig. 6). The clustering of genets in the PCoA was consistent with that in the NJ phylogenetic tree (Fig. 4), with P1, P2, P16, and P17 separated from the remaining populations. There were a lot of overlap genets of the center populations, and we inferred that those populations had more gene flow. STRUCTURE software was used to analyze the genetic population structure of the 491 genets. K = 2 indicated that the populations could be clearly divided into two distinct subgroups. The highest proportion of genets assigned to the second population were P9 (21.2%), P16 (32.3%), and P17 (87.8%) at K = 2 (Fig. 7). P17 population always showed obvious particularity according to the analysis was run for 2 K 5.

DISCUSSION
There are about 330-500 species of Salix in the world. Many species show phenotypic plasticity (Karp et al., 2011). Phenotypic diversity is influenced by genetic and environmental factors (Korior et al., 2013). Willows are deciduous plants with simple, alternate leaves, and exhibit large variations in leaf size (Karp et al., 2011). For example, S. viminalis has long slender leaves, while S. pentandra has large broader leaves. In the present study, the mean coefficient of variation of the nine phenotypic traits in 17 populations ranged from 18.35% (BA) to 38.52% (LA). The results demonstrated remarkable variation in the leaves of S. psammophila. However, sex did not strongly affect responses in this dioecious species (Table S3). Using the animal model with SSR-based relatedness estimated 17 populations of S. psammophila, we found relatively moderate heritability values for phenotypic traits (Table 3). Estimating using animal model had lower heritability than other approaches (Postma, 2014). Gouker (2016) reported marker-based estimations of narrow sense heritability of all traits were calculated ranged from 0.29 to 0.87 based on genotypic means. In present study showed that the heritability values of phenotypic trait from BA (h 2 = 0.255) to LW (h 2 = 0.031), suggesting that most of the trait variation were caused by environmental or developmental variation. In the present study, the experimental materials were selected from the germplasm resource preservation library with same growth environment. This might be a reason why the heritability values were higher than for S. herbacea . We suggested that the significant heritable variance in phenotypic traits might help S. psammophila to adapt to a changing environment. In other words, breeding values were also calculated based on heritability (Table S4). The largest five genets of breeding values were selected as useful germplasm material for phenotypic traits. A high degree of genetic diversity was also observed in the 491 S. psammophila genets on the basis of analysis using 22 SSR markers: the Na was 7.36 and mean He was 0.65, which was comparable to the genetic diversity reported in S. viminalis (Na = 13.46, He = 0.616) (Berlin et al., 2014), S. caprea (Na = 10, He = 0.58) (Perdereau et al., 2014), Populus nigra (Na = 15.9, He = 0.47) (Jiang et al., 2015). S. cheilophila distributed in the Kubuqi Desert and Mu Us Desert, and often formed willow forests with S. psammophila. Therefore, S. psammophila frequently hybridizes and its tetraploidy might have contributed to its high degree of genetic diversity. The AMOVA indicated that most of the variation in S. psammophila (95%) occurred within populations. This is consistent with the analysis of SSR genetic diversity in S. viminalis (94%) (Zhai et al., 2016), and P. nigra (90.8%) (Jiang et al., 2015). The geographical distance between the two farthest populations (P3 and P17) is only 438 km; this would give rise to more gene flow between populations through pollen and seed dispersal. Thus, dioecious pollination in S. psammophila restricted to a comparatively small area could result in increased gene flow across generations and cause remarkable variations within populations (Hamrick & Godt, 1996).
Comparison of cluster analyses based on nine phenotypic traits as well as SSR data of S. psammophila populations showed that P1, P2, P9, P16, and P17 were distinct from the other populations. This was confirmed by analysis of population structure and PCoA analysis based on SSR data. Notably, all these separated populations were from the periphery of the distribution area-P1 and P2 from the northeast, P9 from the center, P16 and P17 from the southwest. The 17 populations of S. psammophila basically covered all of them in the Kubuqi Desert and Mu Us Desert, and the populations formed belt distribution from the northwest to the southeast. The Salicaceae species are dioecious, and exhibit a combination of anemophilous and entomophilous pollination (Peeters & Totland, 1999). S. psammophila mainly depends on wind pollination; its flowers usually bloom in early April and fruits ripen in early May. The seeds of S. psammophila are very small, with a 1,000-seed weight of approximately 0.08 g, which can facilitate dispersal by wind . Wind direction, speed, and persistence play important roles in pollen transport, particularly when weak winds prevail for a considerable part of the year (Damialis et al., 2005). However, the northwest winds mainly occur in this region from April to May. This hinders genetic flow between P1, P2, P16, and P17 (marginal populations and upstream in wind direction) and other populations. In present study, we also found that these populations had relatively smaller coefficient of variation in phenotypic traits (21.16%, 20.32%, 22. 41%, and 19.72% in P1, P2, P16, and P17, respectively), and lower PIC based on SSR (0.59, 0.54, 0.59, and 0.51 in P1, P2, P16, and P17, respectively). Although P9 was also located on the marginal parts of the distribution area, this population was rich in S. psammophila in the upstream direction of the wind. Therefore, P9 had higher coefficient of variation in phenotypic traits (23.41%) and PIC (0.6) than P1, P2, P16, and P17. The results showed that the marginal populations located in upstream direction of wind had lower genetic diversity, which may be related to the pollination pattern of the wind. P3, P11, and P15 were clustered separately on the basis of distinct phenotypic traits but not on the basis of SSR data. This can be explained by the fact that the expression of plant traits is the result of the interaction between genotypes and the internal and external environments. Moreover, SSR markers are a simple motif that cannot cover the entire genome, whereas phenotypic traits are influenced by specific gene loci (He et al., 2015;Zhang et al., 2010).
In the cluster analysis of all individuals on the basis of phenotype (Fig. 2) and NJ phylogenetic analysis (Fig. 4) showed that the genets from the P17 population were obviously isolated from all individuals, showing more obvious particularity. First, we found that P17 population showed larger leaves in the phenotypic study. P1-P16 were located in the Kubuqi Desert and Mu Us Desert with typical desert habitats, while the P17 genets were sampled around Haba Lake with wetland habitats. Environmental heterogeneity in the original populations may induce adaptive genetic differentiation, which in turn results on phenotypic variability. The studied on populations of S. herbacea can respond to shifts in snowmelt by plastic changes in phenology and leaf size . Potential drivers of phenotypic variability in Salix, such as community facilitation and competition , nutrient availability (Little et al., 2016), and abiotic stress , which response in phenotypic plasticity can help plant adapt to the new environment. Moreover, Sedlacek et al. (2014) indicated that soil inocula from different origins affected seed germination rates and growth of S. herbacea and interaction with soil biota may constrain migration of S. herbacea to higher altitudes. S. psammophila performs well in dry conditions and can be used to cope with soil erosion, plant-soil interactions also may play in the habitat suitability. Second, P17 was located in the southwestern part of the distribution area, with upstream wind direction, and at a relatively greater distance from the surrounding population P16 (approximately 47.733 km). Gene flow was mainly found to be affected by the distance between the donor and acceptor plants (Rognli, Nilsson & Nurminiemi, 2010). Therefore, these factors may lead to low gene flow with other populations. The present study showed that the genets from the P17 population had relatively smallest coefficient of variation in phenotypic traits (19.72%) and PIC (0.51), and therefore had lowest genetic diversity, showing more obvious particularity of characteristics.

CONCLUSIONS
In the present study, we analyzed 491 genets from 17 natural populations of S. psammophila to evaluate genetic diversity, population structure based on morphology and SSR markers. Our results suggest that leaf variation of S. psammophila varies richly. The frequently hybridize and the tetraploidy of S. psammophila may have contributed to its high degree of genetic diversity. The results showed that the marginal populations located in the northeastern and southwestern had lower genetic diversity, which could be related to the direction of wind. Moreover, we found relatively moderate heritability values for phenotypic traits using the animal model with SSR-based relatedness estimated, suggesting that most of the trait variation were caused by environmental or developmental variation. Nonetheless, the findings presented here provide a theoretical basis for improving S. psammophila germplasm resources as well as useful information for breeding programs. Further research should advance in understanding the potential drivers of phenotypic variability and pay more attention to marginal populations of S. psammophila.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by the National Natural Science Foundation of China (grant numbers 31160167) and Scientific Research Project of National Forestry Public