Genome-Wide Development of Polymorphic Microsatellite Markers and Genetic Diversity Analysis for the Halophyte Suaeda aralocaspica (Amaranthaceae)

Suaeda aralocaspica, which is an annual halophyte, grows in saline deserts in Central Asia with potential use in saline soil reclamation and salt tolerance breeding. Studying its genetic diversity is critical for effective conservation and breeding programs. In this study, we aimed to develop a set of polymorphic microsatellite markers to analyze the genetic diversity of S. aralocaspica. We identified 177,805 SSRs from the S. aralocaspica genome, with an average length of 19.49 bp, which were present at a density of 393.37 SSR/Mb. Trinucleotide repeats dominated (75.74%) different types of motifs, and the main motif was CAA/TTG (44.25%). We successfully developed 38 SSR markers that exhibited substantial polymorphism, displaying an average of 6.18 alleles with accompanying average polymorphism information content (PIC) value of 0.516. The markers were used to evaluate the genetic diversity of 52 individuals collected from three populations of S. aralocaspica in Xinjiang, China. The results showed that the genetic diversity was moderate to high, with a mean expected heterozygosity (He) of 0.614, a mean Shannon’s information index (I) of 1.23, and a mean genetic differentiation index (Fst) of 0.263. The SSR markers developed in this study provide a valuable resource for future genetic studies and breeding programs of S. aralocaspica, and even other species in Suaeda.


Introduction
Suaeda spp., which has more than 100 species, is a genus of flowering plants that belongs to the family Amaranthaceae [1]. These plants are halophytic herbs or shrubs commonly found in Asia, Europe, North America, and seashores worldwide [2]. Suaeda plants are highly adapted to extreme salt and water conditions [3]. They have a unique structure that allows them to store water and tolerate high levels of salinity in the soil [4]. In addition to their medicinal properties, some species of Suaeda are used for reclamation of degraded saline lands, as well as in the production of biofuels due to their high oil content [5]. Suaeda represents a significant ecological and economic resource for many countries around the world [3].
Suaeda aralocaspica is a plant species with succulent leaves that belongs to the family Amaranthaceae. This annual halophyte plant is native to Central Asia and can be found in the salty deserts of the region. In China, this plant is found in the cold desert regions of the Junggar Basin in Xinjiang [6,7]. It carries out complete C 4 photosynthesis within individual cells but lacks the characteristic leaf anatomy of other C 4 plants. These features make it potentially valuable in biotechnology of higher photosynthetic efficiencies in agriculturally important C 3 carbon fixation species such as rice [8]. Seed heteromorphism, which refers to The length of SSRs found on the complete genome of S. aralocaspica was from 12 bp to 9862 bp, with an average length of 19.49 bp. The most frequently occurring repeat length was 12 bp, which appeared 73,445 times, and was followed by 15 bp, 18 bp, and 16 bp, occurring at frequencies of 28,116,17,288, and 10,626, respectively ( Figure 1). An analysis of the repeated motifs at each SSR locus revealed that the number of repeats ranged from 4 to 1551, and the majority of loci had four tandem repeats (43.77%), followed by those with five tandem repeats (17.57%) ( Figure 2).

Analysis of the Distribution of SSRs in the Genome of S. aralocaspica
MISA software was utilized to screen 452 Mb of the entire genome of S. aralocaspica, which led to the detection of 177,805 SSR markers. The total length of these markers amounted to 3,465,418 bp. The frequency and density of SSR in the entire genome were determined to be 393.37SSR/Mb and 7666.85 bp/Mb, respectively, representing approximately 0.77% of the whole genome sequence (Table 1). The length of SSRs found on the complete genome of S. aralocaspica was from 12 bp to 9862 bp, with an average length of 19.49 bp. The most frequently occurring repeat length was 12 bp, which appeared 73,445 times, and was followed by 15 bp, 18 bp, and 16 bp, occurring at frequencies of 28,116, 17,288, and 10,626, respectively ( Figure 1). An analysis of the repeated motifs at each SSR locus revealed that the number of repeats ranged from 4 to 1551, and the majority of loci had four tandem repeats (43.77%), followed by those with five tandem repeats (17.57%) ( Figure 2).  Trinucleotide repeats were the most common, followed by dinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats in the whole genome of S. aralocaspica. The total length of SSRs in the genome was found to be 3,465,418 bp, and the total length of  SSRs with di-, tri-, tetra-, penta-, and hexanucleotide repeats was 473,878 bp, 2,383,446 bp,  337,516 bp, 113,540 bp, and 157,038   Trinucleotide repeats were the most common, followed by dinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats in the whole genome of S. aralocaspica. The total length of SSRs in the genome was found to be 3,465,418 bp, and the total length of SSRs with di-, tri-, tetra-, penta-, and hexanucleotide repeats was 473,878 bp, 2,383,446 bp, 337,516 bp, 113,540 bp, and 157,038 bp, respectively. The average length of each basic sequence was 18.76 bp, 17.70 bp, 30.27 bp, 30.11 bp, and 52.96 bp, respectively (Table 2). In the entire genome of S. aralocaspica, a total of 177,805 SSRs were identified, which contained 355 nucleotide repeats. These repeats were categorized into di-, tri-, tetra-, penta-, and hexanucleotide repeats, with 4, 10, 32, 88, and 221 nucleotide repeats, respectively. The most commonly used repeat motif was AAC/GTT (44.25%), followed by AT/AT (10.08%), AAAT/ATTT (2.85%), AAAAT/ATTTT (0.54%), and AACAAT/ATTGTT (0.15%). Among all the repeat motifs, di-, tri-, and tetranucleotide repeats were found to be the most prevalent types, while the proportion of five-and six-nucleotide repeats was considerably smaller, accounting for only 3.79% of the total 309 nucleotide repeats (Table 3).  In the entire genome of S. aralocaspica, a total of 177,805 SSRs were identified, which contained 355 nucleotide repeats. These repeats were categorized into di-, tri-, tetra-, penta-, and hexanucleotide repeats, with 4, 10, 32, 88, and 221 nucleotide repeats, respectively. The most commonly used repeat motif was AAC/GTT (44.25%), followed by AT/AT (10.08%), AAAT/ATTT (2.85%), AAAAT/ATTTT (0.54%), and AACAAT/ATTGTT (0.15%). Among all the repeat motifs, di-, tri-, and tetranucleotide repeats were found to be the most prevalent types, while the proportion of five-and six-nucleotide repeats was considerably smaller, accounting for only 3.79% of the total 309 nucleotide repeats (Table 3).

Development of Genome SSR Markers for S. aralocaspica
A set of 100 candidate SSR pairs was chosen for S. aralocaspica, out of which, 88 were successful in producing clear bands with eight or more SSR loci. These loci were further tested for polymorphism using UV observation of PCR products and 2% agar gel electrophoresis. The resulting data was analyzed using Cervus v3.0.7, which identified 38 highly polymorphic SSR loci (Table S1). These loci were then used to amplify three distinct populations of S. aralocaspica. A total of 52 samples were collected from three populations of S. aralocaspica and were analyzed using 38 SSR loci ( Table 4). The results indicated that these loci contained a total of 235 alleles, with an average of 6.18 Na (number of alleles) per individual locus, ranging between 3 and 12. Locus SA-di-63 and SA-te-94 had the highest number of alleles, while SA-te-29 and SA-te-98 had the lowest. The average value of Ne (effective alleles) across all loci was 2.96, with a maximum of 5.131 observed at locus SA-tri-85 and a minimum of

Genetic Diversity Analysis of S. aralocaspica Populations
The Na values of the selected markers amplified for the three populations of S. aralocaspica ranged from 2.921 to 4.447, while Ne ranged from 1.794 to 2.667. The values for I ranged from 0.642 to 1.085, Ho ranged from 0.207 to 0.299, He ranged from 0.365 to 0.573, uHe ranged from 0.376 to 0.591, and F ranged between 0.515 and 0.602 (Table 5). The analysis of molecular variation (AMOVA) in the S. aralocaspica population revealed that 35% of the genetic variation was present within individuals, 32% varied between populations, and 33% was seen between individuals.
In the populations studied, the genetic identity between the Shawan and Shihezi S. aralocaspica populations was found to be the highest, at 0.654. On the other hand, the genetic identity between Fukang-Shawan and Fukang-Shihezi pairs was lower, at 0.438 and 0.548, respectively (Table 6). This was further supported by the UPGMA analysis, where the Shawan and Shihezi populations were found to be genetically distant from each other, while the Fuakang and Shawan populations showed the most differences in genetic identity (Figure 3). tween populations, and 33% was seen between individuals. In the populations studied, the genetic identity between the Shawan and Shihezi S. aralocaspica populations was found to be the highest, at 0.654. On the other hand, the genetic identity between Fukang-Shawan and Fukang-Shihezi pairs was lower, at 0.438 and 0.548, respectively (Table 6). This was further supported by the UPGMA analysis, where the Shawan and Shihezi populations were found to be genetically distant from each other, while the Fuakang and Shawan populations showed the most differences in genetic identity ( Figure 3).   The analysis using PCoA indicated that the initial principal coordinate encompassed 31.91% of the overall genetic variation, with the second and third principal coordinates accounting for 14.72% and 12.60%, respectively. The combined eigen-values for the three principal coordinates reached 59.22% (Figure 4). distances using Mega 6, with different colors representing different populations, populat Shawan (S1-S18) in green, population Shihezi (M1-M16, M20) in pink, and population Fukang (B B22, missing B4, B7, B12, B17) in yellow.
The analysis using PCoA indicated that the initial principal coordinate encompass 31.91% of the overall genetic variation, with the second and third principal coordina accounting for 14.72% and 12.60%, respectively. The combined eigen-values for the thr principal coordinates reached 59.22% (Figure 4).

Discussion
Suaeda is a valuable representative plant species among saline plants, with significa research potential. However, the development of molecular markers for Suaeda has be limited, hindering the progress of molecular ecology and population genetics studies this genus. To address this gap, we utilized the sequencing data of S. aralocaspica to ide tify SSRs on its genome and subsequently developed 38 SSR markers with good polym phism. These markers can be used to analyze the genetic diversity of S. aralocaspica a contribute to the study of genetic diversity and structure of the Suaeda plants.
After performing the analysis, we discovered that the frequency of SSRs in S. ara caspica's entire genome was 393.37 SSRs/Mb. This amount was significantly higher wh compared to Anemone coronaria (65.52 SSRs/Mb), Solanum melongena (120 SSRs/Mb), a Triticum aestivum (36.68 SSRs/Mb) [27][28][29]. Therefore, it can be inferred that SSRs we abundant in the whole genome of S. aralocaspica. Dinucleotide and trinucleotide repe were the most common SSRs in the genome of S. aralocaspica, similar to Tartary buckwh [30]. However, there is a difference in abundance between the two, as dinucleotide repe were the most common in Tartary buckwheat (63.95%), whereas trinucleotide repe were more common in S. aralocaspica (75.74%).
In this study, we examined the genetic diversity of S. aralocaspica using 38 pairs SSR primers developed for various populations. We found that the expected heteroz gosity (He) and Shannon's information index (I) values for the three populations rang from 0.288 to 0.805 and 0.642 to 1.085, respectively. A comparison of our results with tho obtained for other species, such as S. maritima (He = 0.37, I = 0.97) [25], S. corniculata sub mongolica (I = 0.1688), S. "jacutica" (I = 0.0878), and S. corniculata s. str [23].

Discussion
Suaeda is a valuable representative plant species among saline plants, with significant research potential. However, the development of molecular markers for Suaeda has been limited, hindering the progress of molecular ecology and population genetics studies on this genus. To address this gap, we utilized the sequencing data of S. aralocaspica to identify SSRs on its genome and subsequently developed 38 SSR markers with good polymorphism. These markers can be used to analyze the genetic diversity of S. aralocaspica and contribute to the study of genetic diversity and structure of the Suaeda plants.
After performing the analysis, we discovered that the frequency of SSRs in S. aralocaspica's entire genome was 393.37 SSRs/Mb. This amount was significantly higher when compared to Anemone coronaria (65.52 SSRs/Mb), Solanum melongena (120 SSRs/Mb), and Triticum aestivum (36.68 SSRs/Mb) [27][28][29]. Therefore, it can be inferred that SSRs were abundant in the whole genome of S. aralocaspica. Dinucleotide and trinucleotide repeats were the most common SSRs in the genome of S. aralocaspica, similar to Tartary buckwheat [30]. However, there is a difference in abundance between the two, as dinucleotide repeats were the most common in Tartary buckwheat (63.95%), whereas trinucleotide repeats were more common in S. aralocaspica (75.74%).
In this study, we examined the genetic diversity of S. aralocaspica using 38 pairs of SSR primers developed for various populations. We found that the expected heterozygosity (He) and Shannon's information index (I) values for the three populations ranged from 0.288 to 0.805 and 0.642 to 1.085, respectively. A comparison of our results with those obtained for other species, such as S. maritima (He = 0.37, I = 0.97) [25], S. corniculata subsp. mongolica (I = 0.1688), S. "jacutica" (I = 0.0878), and S. corniculata s. str [23].  [31][32][33][34][35][36]. It is essential to note that different plants and SSR loci, as well as the number of markers used, can all affect genetic diversity analysis results.
The genetic diversity analysis of S. aralocaspica populations from three distinct regions revealed differences in their genetic diversity. Specifically, the genetic diversity of S. aralocaspica populations in Fukang was significantly greater than that of populations in Shihezi and Shawan. This may be due to the different habitats of the three populations. For instance, the S. aralocaspica population in Fukang is located near a protected reservoir, making it less susceptible to human activities and boasting relatively stable conditions. It also has a larger habitat area and population size compared to those in Shihezi and Shawan. Conversely, the S. aralocaspica populations in Shihezi and Shawan are more frequently impacted by human activities and experience greater environmental volatility. This may explain the significant differences in genetic diversity observed between the three populations.
In this study, we compared the genetic identity and actual geographic distance of three distinct S. aralocaspica populations. Our findings revealed that the S. aralocaspica populations of Shihezi and Shawan had the closest geographic distance and the highest genetic identity. Conversely, the Fukang and Shawan populations showed the greatest geographic distance and the lowest genetic similarity. These results indicate an inverse relationship between genetic identity and geographic distance among the three S. aralocaspica populations. This suggests that geographic distance may play a vital role in influencing gene flow among different S. aralocaspica populations.
Gene flow has a reducing effect on genetic differentiation between populations, particularly where gene flow is greater than 1 number of migrants (Nm). However, when Nm is less than 1, local differentiation between populations tends to occur. Despite this, the collected samples of S. aralocaspica indicate greater genetic differentiation, with mean Fst values above 0.25. This suggests that S. aralocaspica may have undergone local adaptation due to high selection pressure, which can occur even in the presence of high levels of gene flow, according to Endler et al. [37]. In combination with other data, it is possible that high selection pressure has contributed to the local adaptation of S. aralocaspica [38]. Factors that influence plant genetic diversity include species-related factors such as mating systems, bottleneck effects, evolution, and life history, as well as anthropogenic factors.
Heterozygous species typically exhibit greater genetic diversity than self-incompatible species [39,40]. Although there is no definitive evidence on the mating system of S. aralocaspica, its monoecious annual nature and inbreeding coefficients exceeding 0.5 in all three populations suggest a likelihood of more pronounced inbreeding. Further analysis is necessary to determine the exact mating system. Mating among close relatives in small populations often happens out of necessity and results in high inbreeding coefficients and a decline in genetic diversity [41]. S. aralocaspica populations exhibit high genetic diversity and this may be attributed to natural or anthropogenic factors that have caused severe habitat fragmentation and a significant reduction in population size, with the small populations inheriting a fraction of the genetic variation from the original large populations. Further investigation is needed to establish the exact causes of this phenomenon.

Plant Material
The 52 fresh plant samples utilized were obtained from three distinct populations of S. aralocaspica (17 individuals from Fukang (87 • 40 E, 44 • 13 N), 17 individuals from Shihezi (87 • 14 E, 44 • 45 N), and 18 individuals from Shawan (85 • 50 E, 44 • 36 N)) in July 2021 from Xinjiang, China. Fresh leaves of S. aralocaspica within different habitats were collected using the quadrat method. After collection, the samples were carefully wrapped in tinfoil and preserved in liquid nitrogen to maintain their freshness. Subsequently, the plant materials were stored in an ultra-low-temperature refrigerator at −80 • C until they were utilized for subsequent experiments.

Genome SSR Identification and Development
This study utilized whole genome sequencing results of S. aralocaspica, obtained from the open access data of NCBI (with project registration number PRJNA428881), as a basis for developing a set of SSR primers with high polymorphism. In order to achieve this, our study considered microsatellite markers with a standard size of 2-6 bp, excluding single nucleotides. To determine the microsatellite loci, the MISA software was utilized, focusing on nucleotide microsatellites with a minimum repeat number of 4. We then used Primer 3 software to design primers specific to flanking genomic sequences based on the read parameters of the microsatellite regions. The expected amplicon length of our design ranged from 100 to 300 bp.

PCR Amplification, and Electrophoresis Detection
Genomic DNA from S. aralocaspica plant material was extracted using the DNAsecure Plant Kit (Tiangen Biotechnology, Beijing, China), following the manufacturer's guidelines. From all genomic SSRs, 100 candidate primer pairs were randomly selected based on the minimum lengths of 14, 18, and 20 bp for di-, tri-, and tetranucleotide repeats [42,43], respectively. The forward primers' 5 ends were labeled with FAM blue, a fluorescent dye (Shanghai General Biotechnology, Shanghai, China), for easy scoring in genotyping. PCR amplification was performed using the selected primers for all sample DNA in a 25 µL reaction system containing 1 µL of template DNA, 1 µL each of upstream and downstream primers, 2 × EasyTaq PCR SuperMix 12.5 µL, and ddH 2 O 9.5 µL. The amplification reaction procedure consisted of three stages: pre-denaturation at 94 • C for 3 min in the first stage and, in the second stage, 30 cycles of denaturation at 94 • C for 30 s, annealing at 58 • C for 30 s, and extension at 72 • C for 30 s. Successful amplification was confirmed by analyzing at least eight or more of the ten individuals' bright bands using 2% agarose gel electrophoresis. Alleles (.FSA) generated after the PCR amplification experiment were analyzed using Gen-eMarker v2.2.0 (SoftGenetics LLC., State College, PA, USA) genotyping software. Primers with good polymorphism were selected for amplification of the collected S. aralocaspica population material by data analysis, followed by analysis of the amplification results.

Data Analysis
Various genetic diversity indices were computed using different software packages. Popgene 32 [44], Cervus v3.0.7, and GenAlEx 6.5 were used to calculate the number of alleles (Na), number of effective alleles (Ne), Shannon information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and polymorphism information content (PIC) [45]. The Nei's gene diversity index was also calculated using the Nei's genetic distance. Subsequently, UPGMA clustering was performed for all S. aralocaspica samples using Mega6 [46]. Principal coordinate analysis and genetic similarity analysis were carried out using GenAlEx 6.5 software. DataFormater 2010 was used to convert the data types [47].

Conclusions
Based on the sequencing of the whole genome of S. aralocaspica, this study has successfully developed a large set of polymorphic microsatellite markers that can be used for the genetic diversity analysis of halophyte S. aralocaspica. The results of the genetic diversity analysis showed that the populations of S. aralocaspica had moderate levels of genetic differentiation and high levels of genetic diversity within each population. The microsatellite markers developed in this study provide a valuable tool for future population genetics studies and conservation efforts, not just for this species but also for other Suaeda species. Overall, this study highlights the importance of genetic diversity analysis in understanding the adaptive potential and conservation of halophyte species in changing environments.