The Genetic Diversity and Geographic Differentiation of the Wild Soybean in Northeast China Based on Nuclear Microsatellite Variation

In this study, the genetic diversity and population structure of 205 wild soybean core collections in Northeast China from nine latitude populations and nine longitude populations were evaluated using SSR markers. A total of 973 alleles were detected by 43 SSR loci, and the average number of alleles per locus was 22.628. The mean Shannon information index (I) and the mean expected heterozygosity were 2.528 and 0.879, respectively. At the population level, the regions of 42°N and 124°E had the highest genetic diversity among all latitudes and longitudes. The greater the difference in latitude was, the greater the genetic distance was, whereas a similar trend was not found in longitude populations. Three main clusters (1N, <41°N-42°N; 2N, 43°N-44°N; and 3N, 45°N–>49°N) were assigned to populations. AMOVA analysis showed that the genetic differentiation among latitude and longitude populations was 0.088 and 0.058, respectively, and the majority of genetic variation occurred within populations. The Mantel test revealed that genetic distance was significantly correlated with geographical distance (r = 0.207, p < 0.05). Furthermore, spatial autocorrelation analysis showed that there was a spatial structure (ω = 119.58, p < 0.01) and the correlation coefficient (r) decreased as distance increased within a radius of 250 km.


Introduction
The annual wild soybean (Glycine soja Sieb. and Zucc.), the direct progenitor of the cultivated soybean (Glycine max (Linn.) Merr.), is a predominantly self-pollinated annual plant species [1,2]. It is widely distributed across most provinces of China, with the exceptions of Qinghai, Xinjiang, and Hainan [3]. In Northeast China, the wild soybean is well known for its abundant populations, high population density, and rich phenotypic types [4]. A total of 48 wild soybean in situ reserves have been established in China, 14 of which are located in Northeast China. A total of 8518 wild soybean accessions have been ex situ conserved in the National Gene Bank, and nearly half were collected from Northeast China [5]. Some reports have suggested that Northeast China was probably a very important diversity center for wild soybeans [6][7][8].
Genetic diversity is essential to population stability and is the basis of the evolution of species [9]. The genetic diversity and population structure of wild soybeans have been described in many reports [10][11][12][13]. The wild soybean in Northeast China, which is a very important ecotype, has been used in many previous studies [14][15][16][17][18]. The genetic diversity of wild soybeans in Northeast China was often compared with that in other areas using morphological traits and various molecular markers. Alternatively, wild soybeans from some specific areas have been analyzed for their distribution patterns, origin, evolution, classification, and so on. Accordingly, results have not been consistent because different accessions (or populations), numbers of samples, and analysis methods have been used. SSR markers have proven to be a reliable tool for determining the diversity of the wild soybean [19]. However, regarding wild soybeans in Northeast China as a single population, their genetic diversity and population structure have not been fully understood using molecular markers; in particular, the genetic differentiation of geographical populations of this region has not been well investigated.
The objectives of this work were to determine the extent of genetic variation in the wild soybean in Northeast China, to elucidate the geographical structure and genetic differentiation within and between latitude or longitude populations, and, ultimately, to provide valuable information for the scientific protection and efficient utilization of wild soybean resources in Northeast China.

Materials and Methods
2.1. Plant Material. All the wild soybean accessions in this study were preserved in the Gene Bank of Jilin Academy of Agricultural Sciences. A total of 205 wild soybean accessions from Northeast China were selected from the core collection established by Zhao et al. [20]. The collection sites of these samples were located within the region of 39-52°N and 119-133°E, covering 84 counties or districts. Nine latitude populations and nine longitude populations were divided based on a degree interval when the sample size was higher than 10; otherwise, the samples were combined into adjacent latitude or longitude population (see Figure 1 and Supplementary Table 1).
2.2. SSR Analysis. One single fresh leaf was used to extract genomic DNA for each accession using a modified CTAB method [21]; 43 SSR markers (see Supplementary Table 2), developed from 60 core loci [22] on 20 genetic linkage groups, were used to detect nuclear DNA variation. The primer sequences, with their linkage group locations, are available at https://www.soybase.org/dlpages/#soybasedata. PCR was performed using the T100™ thermal cycler (Bio-Rad, USA) with the following cycle conditions: an initial denaturing at 95°C for 5 min, followed by 35 cycles of 95°C denaturing for 45 s, 52-57°C annealing for 45 s, 68°C extension for 45 s, and a final extension at 72°C for 10 min.
Amplified products were fractionated by electrophoresis through 6% denaturing polyacrylamide gels and stained with silver staining, which could detect 60 bp to 500 bp fragments and had a high resolution of 2 bp. The size of the stained band was analyzed based on its migration distance relative to the 100 bp DNA ladder (MBI Fermentas) using AlphaView software (version 1.3.0.7).

Data Analysis.
The amplification fragments of genomic DNA by each SSR marker were scored based on the migration difference. The data format was converted accordingly in Microsoft Excel. The number of alleles (N a ), Shannon-Weaver index (I), expected heterozygosity (H e ), observed heterozygosity (H 0 ), fixation index (F is ), genetic differentiation coefficient (F st ), genetic identity and genetic distance, molecular variation analysis of variance (AMOVA), Mantel tests, and spatial autocorrelation coefficients were computed by GenAIEx v6.5 [23]. The Shannon-Weaver index (I) and expected heterozygosity (H e ) for evaluating the diversity were measured according to the formulas I = −1 × sum p i × Ln p i and H e = 1sum p i 2 , where p i is the frequency of the ith allele. H 0 is generally lower than H e due to inbreeding, and F is = 1 − H 0 /H e . Outcrossing rate ( t) was calculated using the equation t = 1 − F is / 1 + F is [24]. Based on the matrix of Nei's genetic distance [25], the dendrogram was constructed using NTSYSpc21 [26]. Spatial autocorrelation analysis was performed using "spatial-single  pop"; spatial distance was set as 100 km, and the number of permutations and the bootstraps of the selection mode were set as 999 times.
The population genetic structure was predicted by STRUCTURE 2.3.4 [27]. The default k value was set to 1 to 12, and ten runs were performed for each value of k to test  Table 1). At the population level (see Table 2 Tables 3 and 4, the smaller the latitude difference was, the higher the genetic identity was; however, similar trends were not found in longitude populations. Two major geographical clustering groups (N and S) can be seen in the UPGMA dendrogram. Group N consists of four northern latitude groups (46°N->49°N), and group S consists of five southern latitude groups (<41°N-45°N) (see Figure 2). The results indicate that the genetic diversity of wild soybean accessions in Northeast China is related to their latitudinal origin.

Population
Structure and Genetic Differentiation. The STRUCTURE procedure was run to predict genetic structure for each predefined latitude and longitude population. When k was 3, Δk was the highest, which indicated that 3 main clusters had been identified (see Table 5). For the latitude population, 83.8% of individuals from the <41°N region and 50.2% of individuals from the 42°N region were assigned to Cluster1N, most individuals from the 43°N to 44°N region to Cluster2N, and individuals from the 45°N to >49°N regions to Cluster3N. These results were roughly consistent with those of hierarchical cluster analysis (see Figure 2). Some admixtures were found among the 41°N-45°N regions, which probably were important transitional areas. For the longitude population, most of the individuals from the <122°E region were separated from all others and formed a cluster (Cluster1E); three regions of 123°E, 125°E, and 129°E were assigned to Cluster2E, and three regions of 127°E, 128°E, and 130°E to Cluster3L. Admixtures were found widely among 9 predefined longitude populations. The regions of 42°N, 124°E, and 126°E were special, as they were not dominated (<60%) by the three clusters.
The F st value was used to evaluate the genetic differentiation of wild soybean populations at the scales of latitude and longitude in Northeast China (see Tables 3 and 4). Pairwise F st values for latitude populations ranged from 0.040 to 0.183, and pairwise F st values for longitude populations ranged from 0.034 to 0.132. In general, moderate differentiation (0.05 < F st < 0.15) [29] was observed between most latitude and longitude populations, and the genetic differentiation among adjacent groups was relatively low. These results were confirmed by AMOVA; most of the genetic variations were found within latitude and longitude populations (see Table 6).
The Mantel test indicated that there was a positive correlation between geographic and genetic distance (r = 0 207, p < 0 05), which suggests that geographic distance limits gene flow among populations and influences the genetic structure. For further analyses, spatial autocorrelation analysis was performed by distance classes of 100 km, and a general decline was found in the correlation coefficient (r) with distance. The correlation values were negative and significant up to 250 km. This revealed that there is a clinal spatial structure in wild soybeans in Northeast China (ω = 119 58, p < 0 01) (see Figure 3).

Discussion
The wild soybean in Northeast China is an important ecotype, and its genetic diversity has been widely studied by using phenotypic traits [7,30,31] and molecular markers [17]. The common view has been that the wild soybean in this region possesses a high genetic variation. In this study, the average allele number, Shannon index (I), and expected heterozygosity (H e ) were 22.628, 2.528, and 0.879, respectively, which were significantly higher than previously reported results [32][33][34]. The rich genetic variation could be attributed to the fact that the samples in this study were selected from the core collection, which has been defined as a subset of a crop species preserved with the most abundant repetitiveness [35,36]. On the other hand, the large sample size and various geographical origins might also result in high diversity; 205 samples used in this study, accounting for 85% of core collections in Northeast China, were selected from 242 core collections developed by Zhao et al. [20]. These accessions may have a continuous distribution in Northeast China (see Supplementary Table 1). Furthermore, 43 SSR primers covering 20 linkage groups might also be important causes for the detection of richer genetic variation [37].
The wild soybean is widely distributed in Northeast Asia. The higher its genetic diversity is, the greater its habitatexpansion capacity and environmental adaptation are [38]. Therefore, it also forms a specific natural distribution pattern [39]. In our study, the results indicate that the region of 42°N and 124°E has the highest level of genetic diversity   (see Table 2), and the results roughly agree with those of previous studies based on morphological traits [7]. The results support the view that the genetic diversity of wild soybeans in Northeast China is related to latitude but not to longitude; three evolutionarily significant units were distinguished by latitude, corresponding to regions of <41°N-42°N, 43°N-44°N, and >45°N (see Tables 3-5). Moderate differentiation among the latitude populations (the mean F st value was 0.088) and longitude populations (the mean F st value was 0.058) occurred (see Table 6). This implies that natural selection might be the main cause of genetic structure [8,29]. Previous studies have revealed that wild soybean genotypes exhibit regional distributions at different geographical scales [15,40,41], which are especially associated with latitudinal origin. However, some studies have also reported that the genetic differentiations were associated with longitude origins; for example, Leamy et al.'s results showed that the four genetic groups (Central China, Northern China, Korea, and Japan) differed more in longitude than in latitude [13]. Possible explanations for those   results may include small sample size, large geographic span, strait isolation, and diverse ecosystems. Genetic structure was mainly determined by the breeding system, gene flow, distance isolation, and so on [42]. Wild soybean is a strictly self-pollinating plant, with limited pollen flow. In general, for self-pollination-dominated plants, with an average G st = 0 51, the total genetic variation among the populations accounts for more than half of the genetic structure; for out-crossing-dominated species G st = 0 10, 90% of genetic variation occurs within populations [43]. In the present study, most of the genetic variation was found between individuals within populations, with less than 10% among populations (see Table 6). This suggests that the genetic differentiation among latitude or longitude populations in Northeast China is similar to that of out-crossingdominated species [17,40]. Zhao speculated that this phenomenon could be explained by out-crossing rates and long-distance gene flow [44]. Our results show that the outcrossing rate of Northeast China wild soybeans is only 0.7% (F is = 0 987) (see Table 1), which confirms its selfing mating reproductive system and plays an important role in keeping a strong genetic structure.
The Mantel test revealed that there was a positive relationship between geographic and genetic distance (r = 0 207, p < 0 05), which indicates that geographical isolation has also been an important factor in forming the current genetic structure of wild soybeans in Northeast China. Spatial autocorrelation analysis revealed that the correlation between geographical distance and genetic distance is limited.
The wild soybean in Northeast China has become one of the most severely endangered wild plant species due to the interference of human activities [45]. The genetic diversity of wild soybeans is high, indicating that the wild soybean in this region has great potential for evolution, and in situ conservation is preferable. Although wild soybean accessions ex situ conserved in the National Gene Bank are more substantial than those from other areas, the collection in this area is still very limited, so further investigation and collection works in this region are necessary. According to distribution patterns of the genetic diversity of wild soybeans in Northeast China, the conservation strategy should emphasize individual protection, and protection in areas with high genetic diversity should be prioritized.

Conclusions
In summary, the genetic diversity and geographic population structure of the wild soybean in Northeast China were fully investigated as a single population, or at different latitude or longitude populations for the first time. The distribution pattern of genetic variation is related to latitude, and the highest level of genetic diversity was found at 42°N, and protection in areas with higher genetic diversity should be prioritized. This study disclosed that natural selection to adapt temperature and photoperiod, selfing mating reproductive system, and distance isolation resulted in the current population structure of wild soybean in Northeast China.

Data Availability
All the wild soybean accessions in this study were preserved in the Gene Bank of Jilin Academy of Agricultural Sciences, and the geographical information for all samples is attached in Supplementary Table 1. The primer sequences with their linkage group locations are available at https://www.soybase .org/dlpages/#soybasedata.

Conflicts of Interest
All authors have reported no financial interests or potential conflicts of interest regarding the publication of this article.