Development and Characterization of SSR Markers via de Novo Transcriptome Assembly in Apocynum. hendersonii

ABSTRACT To provide a theoretical and practical foundation for the genetic analysis of Apocynum species, SSRs were identified based on the transcriptomic data of A.hendersonii leaves. A total of 50,957 unigenes containing 5 660 SSR motifs with an average of 1 SSR locus per 9.09 Kb were identified. The dinucleotide repeat motifs were the most prevalent, having 48.23% of the total repeat, with the dominant repeats being AG/CT (1,703) followed by A/T (1,222). Furthermore, the results of genetic relationship analsyis based on the allele polymorphism of the selected SSR primer showed that A. hendersonii and A.venetum grouped into various clades, and those from different regions clusted into different subgroups.These identified SSR markers may provide a foundation for genetic identification, diversity analysis and marker-assisted genetic breeding in Apocynum species.


Introduction
Apocynum hendersonii and its relative Apocynum venetum are the two main species in the genus Apocynum of the Apocynaceae family that are widely distributed in China from the northwest to the coast of the Yellow Sea (Wang, Han, and Zhang 2007;Xie et al. 2012;Xu et al. 2020;Yuan, Li, and Jia 2020). Both species are adapted to survive in the extreme environment of low rainfall, less than 50 mm annually, and harsh arid conditions such as saline and alkaline deserts, river banks, alluvial plains, and the Gobi desert. Their eco-economic, medical, and textile led to the two receiving widespread attention, and were rapidly developed and cultivated as commercial species, especially in northwestern China (Gao et al. 2019). A. venetum is very well known for its broad medicinal properties due to its numerous phytochemicals (Eshbakova, Bahang, and Aisa 2011). In addition, A. venetum is a good cellulosic bast fiber source with antibacterial and anti-UV properties Wang, Han, and Zhang 2007). Its fiber is especially suitable for skin-contact clothes and has increasingly attracted attention in textile product innovation. A. hendersonii is morphologically similar to A. venetum, with a significant distinction in the color and shapes of their flowers and leaves (Gao et al. 2019). At the phytochemicals constituent level, the content of flavonoid components such as isoquercitrin and hyperoside of A.hendersonii were significantly lower than A. venetum. These may be employed as a marker for discriminating between the two species (Chan et al. 2015).
Although A. venetum and A. hendersonii were widely spread in China, the natural populations were sharply reduced recently due to habitat degradation or over exploitation. Moreover, the lack of elite cultivars restricts the development of these industrial crops. Thus, understanding genetic variation and sustainability will help in these species' genetic improvement and molecular breeding. Molecular markers, especially the SSR, are powerful tools in molecular ecological and germplasm diversity studies (Park, Lee, and Kim 2009;Varshney, Graner, and Sorrells 2005). Previous studies on the A. venetum predicted 101,918 SSR markers (Li et al. 2019). However, genomic SSR markers tend to have lower transferability than EST-SSR markers (Zalapa et al. 2012). EST-based SSR markers were located in the coding region of genes and can be detected more accurately, quickly, and efficiently than genomic-based. In addition, few EST-based markers have been developed in A. venetum and its close relative A. hendersonii (Yuan, Li, and Jia 2020). In this study, the de novo transcriptome assembly of the RNA-seq library of Apocynum (A. hendersonii) leaf was acquired, and a large number of SSR markers from the transcript assemblies were developed. We also identified polymorphisms of randomly selected SSR markers and applied them in the cDNA genetic diversity analysis. The results of this study may help in the SSR-assisted breeding of the Apocynum plant and give new insights into the genetic research of A. venetum and A. hendersonii. A.hendersonii (No. 9,10,11,12,13,14 and 15) and its relative species A. venetum (No.1 and 2) were collected from Xinjiang Uygur autonomous region. The area experiences deficient rainfall and is characteristically saline and alkaline. At the same time, the second collection (No. 3,4,5,6,7,8,16 and 17) was sourced from Shandong province, which is coastal tidal flats or alluvial plains. These germplasms were cultivated in the greenhouse of the resources nursery of the*** (Table.SP1).

Plant materials
The plant morphology of all these collected germplasms was carefully recorded, and their chromosome karyotype was analyzed using fluorescence in situ hybridization with chromosome-specific DNA painting probes (Xiong and Pires 2011). Moreover, germplasm No.12 (A. hendersonii, widely distributed over 10 hectares in patches) collected from the Gobi desert of Xinjiang was selected for RNA sequence and transcriptome assemblies.

RNA extraction, library construction and transcriptome sequencing
Total RNA was extracted using TRIzol Reagent (Invitrogen, USA), and libraries were constructed using RNA-seq Library Prep Kit (VAHTS® Universal V8, CHN) for Illumina. The integrity of the RNA and the RNA-sq library was assessed using an Agilent 2100 BioAnalyzer (Agilent, USA) system. The libraries were sequenced on the BGIEQ-500 platform (BGI, China) based on sequencing by synthesis with 100 bp paired-end reads (BGI Technologies, China). The produced raw data were filtered by trimming adaptor sequences and low-quality sequences (Q < 20) with more than 10% uncertain (N) bases removed. The clean reads were then de novo assembled into unigenes using the short read assembly program Trinity with the minimum kmer_cov set to 2 and all other parameters to their default values (Grabherr et al. 2011;Pertea et al. 2003).
Furthermore, a total of 100 SSR primers were designed using the Primer Premier 5.0(www.bio-soft. net) with parameters set at a primer length of 18-25 bp with an optimal length of 22 bp; PCR products of 150-300 bp; annealing temperature of 55-60°C with an optional temperature of 58°C; and GC content 40-60%. All the primers were synthesized and purified by Sangon Biotech (Shanghai, China).

SSR marker characterization and evaluation
The identified SSR markers were amplified and validated using all the 17 apocynum plants germplasm collections to analyze the SSR polymorphisms. PCR reaction was carried out on Bio-Rad T100™ Thermal Cycler (Bio-Rad, USA) in a total volume of 20 ul containing 1 U Taq DNA polymerase (TIANGEN, China), 1 × PCR Buffer, 1.5 mM MgCl 2 , 0.52 mM dNTPs (TIANGEN, China), 0.4 μM of each primer, and 20 ng of genomic DNA, and the reaction proceed as follows: 5 min denaturation at 94°C; 30 cycles at 94°C denaturations for 30s, 50-58°C annealing for 30 s, 72°C elongations for 1 min; and then a 5 min final elongation at 72°C. The PCR product was purified using a Gel Extraction Kit (Omega, USA) and then separated using 8% non-denaturing polyacrylamide gel electrophoresis. The clear bands from PCR products were genotyped using 0.1% silver staining (Creste, Tulmann Neto, and Figueira 2001). The polymorphic analysis was conducted using Popgene version 1.32 and NTSYSpc version 2.1.

Morphology, chromosome karyotype analysis and transcriptome assembling
All 17 germplasms showed similar morphology, with significant visible distinction in the stem and leaf color and the flower shape between A. venetum and A. hendersonii. The germplasms of A. venetum exhibited a dark-red stem and a smaller flower than A. hendersonii. Noticeably, their stamens adhered tightly to the stigma, and the anthers dehisced before the corolla opening. These suggested that both A. venetum and A. hendersonii were highly self-pollinated. The chromosome analysis of A. venetum and A. hendersonii indicated a normal diploid karyotype of 2n = 22 (Figure 1).
A total of 7.4 Gb raw data were obtained for A. hendersonii transcriptome assembly using the BGIEQ-500 platform (BGI, China) based on sequencing by synthesis with 100 bp paired-end reads, and the data is available at https://www.ncbi.nlm.nih.gov/sra/SUB10126041. The final transcriptome assemblies contained 50,957 unigenes with a total length of 51,426,191 bp. Average length, N50, and GC content were 1,021 bp, 1,632 bp, and 44.12%, respectively (Table 1)

Characteristics of the SSR loci in A. hendersonii
To find associations between functional genes and phenotypes, developing SSR markers from transcriptome sequences is essential, especially in non-model species (Li et al. 2002;Zalapa et al. 2012). In the total of 50,957 A. hendersonii unigenes (51,426,191 bp), 5,660 SSR loci contain in 4,848 unigenes were identified. Of these 4,848 unigenes, 3,800 contain single SSR loci, and 1048 have two or more SSR loci. The occurrence frequency of SSR loci was generally 11.11%, with an average of 1 SSR loci per 9.09 Kb. The result showed the A. hendersonii SSRs from the transcriptome have different repeats types, such as mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide. The species genomic-SSR, however, was reported to comprise only dinucleotide (48.23%), trinucleotide (26.71%), and mononucleotide 23.45% (Li et al. 2019). Nevertheless, the tetranucleotide (0.08%), pentanucleotide (0.05%), and hexanucleotide (0.04%) repeats obtained in this study were very few. The numbers were consistent with the SSR results observed in the transcriptome of other plants   (Gupta et al. 2003;Kalia et al. 2011;Simko 2008). Most short repeat motifs show a relatively high mutational frequency in many species (Toth, Gaspari, and Jurka 2000). Our results showed many shorter repeats (mononucleotide, dinucleotide, and trinucleotide) sequences in A. hendersonii, indicating a high mutational frequency.
The dominant repeat type was AG/CT (1,703) repeats ( Figure 2 & Table 2), accounting for 62.38% of the total dinucleotide motif. The A/T (1,222) repeat was the most abundant mononucleotide motif, representing 92.09% of the total mononucleotides. There were ten repeat types in the trinucleotide motif, with AGG/CTT being the largest (453, accounting for 29.96% of the total trinucleotides). Most of these trinucleotide motifs were rich in A and T, which might arise due to the methylation of C bases (Schorderet and Gartler 1992). The other motifs, such as tetranucleotide, pentanucleotide, and hexanucleotide, were frequently very low.

SSR primer validation
Initially, 100 pairs of primers were randomly screened based on the putative SSRs to validate the amplification efficacy and polymorphisms, out of which 30 pairs successfully produced clearly polymorphisms banding, as visualized by PAGE. These 30 pairs of primers were selected again using the cDNA of 17 individual Apocynum plant collections as templates, and 20 primers pairs showed clear and stable bands with polymorphisms in all the 17 collections ( Figure 3) with the polymorphic ratio (proportion of polymorphic loci) of 66.7%. The details of the selected SSR primers and the unprocessed SDS page gels were shown in Fig.SP1 and Table.SP2. A total of 136 alleles with an average of 3.15 alleles (in the ranges 2 to 5) per locus were detected (Table 3). Among the 20 primer pairs, primer VH14 contains the most alleles (5 alleles), followed by primers VH50, VE4, VE6, VE13, VE19 and VE47 (having four alleles each). The PIC (Polymorphism information content) values varied from 0.24 to 0.66 with a mean of 0.45. Seven primer pairs were classified as highly polymorphic (PIC >0.5), 12 as moderately polymorphic primers (0.25<PIC <0.5), and only one as a low polymorphic primer (PIC <0.25) ( Table 3).

Genetic diversity analysis
A.hendersonii and A. venetum are distributed widely in China. External factors such as fluctuating climate and environmental conditions significantly influenced the genotype or the genetic regulation of gene expression leading to variability (Nicoló et al. 2013). Analyzing the phylogenetic relationships of the different germplasms is imperative and may give an insight into their genetic correlation geographical distance, which may facilitate further use of the genetic resources. A total of 17 Apocynum plants (including 9 A.venetum germplasms and 8 of its closely related species A.hendersonii) collected from the saline and alkaline deserts of Xinjiang Uygur autonomous region and the coastal tidal flats of Shandong province were analyzed based on the allele polymorphisms of the selected 20 SSR primers. Our previous (Gao et al. 2019) studies showed no apparent difference  between the two species' germplasms from the two growing areas, even though the two plant habitats have markedly different geographical conditions. Various molecular genetic marker systems have been used to assess these species' genetic diversity and geographical relationships, yet it is still understudied compared to other plant species (Zhang et al. 2013). Also, few reports assessed genetic diversity in the Apocynum plant germplasms using SSR markers (Chan et al. 2015;Yuan, Li, and Jia 2020) The 17 germplasms were grouped into three main clades according to the Powermarker V 3.25 software and the Neighbor-joining methods. A. venetum and its relative A. hendersonii were divided into separate branches. Among the clade of A. venetum germplasms collected from Xinjiang and Shandong province, the germplasms No. 1, 2 from Xinjiang and No. 6,4,7,16,17,5,3, and 8 from Shandong fall into two subgroups. As for A.hendersonii, the germplasms No. 9, 10, 12, 13, 14, and 15 collected from Xinjiang clustered into the same clade. Thus the results indicated a significant role of geographical distribution in the genetic evolution and diversification of Apocynum species. Germplasms No.11 collected from the salt pit of Maigaiti Xinjiang fall into a separate branch (Figure 4), even though it was classified as A. hendersonii genotype according to morphology. This inconsistency may have resulted from the inadequacy of abundant genetic resource sampling. There still exists controversy over the taxonomy of the two Apocynum species due to their similar morphological characteristics, low genome heterozygosity, and high selfpollination rate. Although our results showed the samples from two geographic origins in China showing different clades, it is clear that the transcriptome of A. hendersonii and the transcriptomicbased SSR markers are helpful to the researcher for assessing the geographical distribution and genetic diversity in Apocynum plants.

Conclusion
A.hendersonii and its relative A. venetum are getting increased attention owing to their high medicinal, textile, and eco-economic values. In this study, the cDNA library of A. hendersonii leaf was sequenced to develop transcriptomic-based SSR markers. A total of 50,957 unigenes and 5,660 SSR markers were assembled and characterized. The short repeat motifs like the mononucleotide, dinucleotide, and trinucleotide were most prevalent (23.45%, 48.23%, and 26.71%, respectively). The dominant dinucleotide repeat type was AG/CT (1,703 repeats), followed by A/T (1,222). Results of genetic diversity analysis using the selected 20 pairs of SSR primers showed a correlation between the germplasms and their geographical distribution. These newly identified markers may provide a foundation for genetic identification, diversity analysis, or marker-assisted selection breeding in Apocynum species.

Disclosure statement
No potential conflict of interest was reported by the authors.