Mining and characterization of novel EST-SSR markers of Parrotia subaequalis (Hamamelidaceae) from the first Illumina-based transcriptome datasets

Parrotia subaequalis is an endangered Tertiary relict tree from eastern China. Despite its important ecological and horticultural value, no transcriptomic data and limited molecular markers are currently available in this species. In this study, we first performed high-throughput transcriptome sequencing of two individuals representing the northernmost (TX) and southernmost (SJD) population of P. subaequalis on the Illumina HiSeq 2500 platform. We gathered a total of 69,135 unigenes for P. subaequalis (TX) and 84,009 unigenes for P. subaequalis (SJD). From two unigenes datasets, 497 candidate polymorphic novel expressed sequence tag-simple sequence repeats (EST-SSRs) were identified using CandiSSR. Among these repeats, di-nucleotide repeats were the most abundant repeat type (62.78%) followed by tri-, tetra- and hexa-nucleotide repeats. We then randomly selected 54 primer pairs for polymorphism validation, of which 27 (50%) were successfully amplified and showed polymorphisms in 96 individuals from six natural populations of P. subaequalis. The average number of alleles per locus and the polymorphism information content values were 3.70 and 0.343; the average observed and expected heterozygosity were 0.378 and 0.394. A relatively high level of genetic diversity (HT = 0.393) and genetic differentiation level (FST = 0.171) were surveyed, indicating P. subaequalis maintained high levels of species diversity in the long-term evolutionary history. Additionally, a high level of cross-transferability (92.59%) was displayed in five congeneric Hamamelidaceae species. Therefore, these new transcriptomic data and novel polymorphic EST-SSR markers will pinpoint genetic resources and facilitate future studies on population genetics and molecular breeding of P. subaequalis and other Hamamelidaceae species.

, the focal species of our study, is a member of the genus Parrotia C. A. Mey. in the Hamamelidaceae family. This species is a diploid (2n = 2x = 24) deciduous tree characterized by unique exfoliating bark, obovate leaves in green, yellow, red or purple, and distinct apetalous bisexual flowers [1,2]. Therefore, P. subaequalis is widely cultivated as a horticultural and ornamental tree in North America, Europe and East Asia [3,4]. However, the natural population size of the wild P. subaequalis has sharply declined due to its narrow geographic distributions in disjunct montane ecosystems of eastern China, serious habitat destruction and the species' alternate-year fruit production [5,6]. Additionally, as an endangered Tertiary relict tree, P. subaequalis is categorized as 'extremely endangered' by the International Union for Conservation of Nature (IUCN) [7] and the Chinese Plant Red Book (Grade I Key Protected Wild Plants) [8]. Thus, collection of the wild germplasm resources, plant breeding, and improvement of genetic variability of P. subaequalis has been attracting increasing amounts of attention from cultivators and researchers because of its high value in gardening applications and extant endangered survival.
Currently, molecular markers are recognized as a reliable and indispensable approach in studies of plant genetics and breeding. Specifically, molecular markers such as randomly amplified polymorphic DNAs (RAPDs), amplified fragment length polymorphisms (AFLPs), inter-simple sequence repeats (ISSRs), and simple sequence repeats or microsatellites (SSRs), are widely used for genetic diversity assessment, gene mapping, marker assisted selection and molecular breeding [9][10][11]. Compared with other types of molecular markers, SSRs have many advantages in abundance, hypervariability, codominant inheritance and extensive genomic and transcriptomic coverage [12]. Based on the location of the original sequences used to identify simple repeats, SSRs can be divided into genomic SSRs (gSSRs) and expressed sequence tag-simple sequence repeats (EST-SSRs). EST-SSR markers are widely used to investigate the population genetic diversity, marker-assisted selection and molecular breeding because of their higher possibility of being functionally associated with important traits or pathways and higher levels of transferability to related species as compared to gSSRs [13][14][15]. In addition, a series of bioinformatics tools have been developed for automated SSR discovery and marker development, such as CandiSSR, which help users to efficiently identify candidate polymorphic SSRs (PolySSRs) from transcriptome datasets or multiple assembled genome sequences rather than in a traditional time-consuming and labor-intensive way [16]. Moreover, in the last decade, with the advent of high-throughput next-generation sequencing (NGS) technologies, including 454 Life Science GSFLX Titanium and the Illumina platform, we can have access to the abundant genetic resources of the species of interest in a rapid and cost-effective way [17][18][19]. The transcriptome refers to the complete set and quantity of transcripts in a cell at a specific developmental stage or under a physiological condition. NGS-derived transcriptome sequencing produces large EST datasets that are exploited for molecular marker development, novel gene identification and population genetic research related to adaptive traits and pathways [20,21].
To date, there remains a lack of available transcriptomic databases of P. subaequalis and the previously studied types of molecular markers developed for P. subaequalis merely includes ISSRs and gSSRs [6,22]. Thus, it is imperative to enlarge the transcriptomic resources for conservation and marker-assisted breeding of P. subaequalis. In this study, we first sequenced the transcriptomes of two individuals of P. subaequalis from the northern-and southernmost populations on the Illumina HiSeq 2500 platform. The objectives of our study were to (1) provide transcriptomic information for these two P. subaequalis transcriptomes, (2) undertake the mining and characterization of novel polymorphic EST-SSR markers for P. subaequalis based on the two transcriptome datasets, and (3) perform the assessment of genetic variation in six natural populations by EST-SSR markers and their cross-species transferability.

Plant samples, RNA, and DNA isolation
For RNA sequencing, the young and fresh leaves of two individuals of P. subaequalis were collected from two natural populations: TX in Anhui Province and SJD in Jiangsu Province (China) (Fig 1 and S1 Table), respectively. Our field work in TX population was under the authority of Tianxia Mountain Landscape Management Administration; And Shanjuan Cave Scenic Spot Management Administration gave the permission of our collection in SJD population. The leaves were chosen to represent the northern-and southernmost distribution of P. subaequalis. Before RNA extraction, all samples were immediately frozen in liquid nitrogen and stored at -80˚C. Total RNA for each individual was extracted using TRIzol Reagent (Invitrogen Life Technologies, Carlsbad, California, USA) and treated with DNase (TakaRa Bio, Shuzo, Kyoto, Japan) following the manufacturer's instructions. The integrity of the RNA was evaluated by agarose gel electrophoresis and validated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The concentration of RNA was measured using a NanoDrop LITE spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). To evaluate the polymorphisms of the EST-SSR markers developed from our transcriptome datasets and analyze the population genetic diversity of P. subaequalis, we collected samples from a total of 96 individuals of P. subaequalis from six natural populations ( Table) were further selected for tests of cross-amplification of the polymorphic EST-SSR markers; for each species, five accessions were used. No specific permissions were required for these species' collection, for they didn't belong to the endangered or protected species. Representative voucher specimens of all plant materials were deposited in the Herbarium of Zhejiang University (HZU). Total genomic DNA was extracted from silica gel-dried leaves with Plant DNAzol Reagent (Invitrogen) following the manufacturer's protocol. The quality of DNA was examined on 0.8% agarose gels stained with 1×GelRed (Biotium) and the concentration was checked using a NanoDrop LITE spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA).

Transcriptome sequencing, de novo assembly and annotation
Two next-generation sequencing (NGS) cDNA libraries were normalized using a NEBNext UltraTM RNA Library Prep Kit for Illumina (New England Biolabs, MA, USA) [23]. The mRNAs of each sample were purified and enriched using poly-T oligo-attached magnetic beads. The two cDNA libraries were then pooled together and sequenced in one lane of the HiSeq 2500 platform (Illumina Inc., San Diego, California, USA) at Beijing Genomics Institute (BGI, Shenzhen, China). The base calling and quality value calculations were performed using Illumina GA Pipeline version 1.6. After filtering the adaptor contamination and low-quality reads by Trimmomatic [24], the clean reads were assembled into transcripts using Trinity version 2.5 with the default parameters [25]. TGICL software [26] was then used to cluster similar transcripts, which generated non-redundant transcripts defined as unigenes for two individuals of P. subaequalis (Table 1). To annotate and identify the putative function of the unigenes, these sequences were subjected to a BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) search with a cut off E value of 10 −5 against the following databases: National Center for Biotechnology Information (NCBI) non-redundant protein sequences (Nr), Swiss-Prot protein (http:// www.expasy.ch/sprot/), NCBI non-redundant nucleotide sequences (Nt), Eukaryotic Orthologous Groups of proteins (KOG) database (http://www.ncbi.nlm.nih.gov/KOG), protein sequence analysis and classification (InterPro) database (http://www.ebi.ac.uk/interpro/) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (http://www. genome.jp/kegg). In addition, gene ontology (GO) terms describing molecular functions, cellular components, and biological processes were assigned using the BLAST2GO program (B2G; http://www.blast2go.com) for further annotation of the unigenes in our study.

Mining of EST-SSR markers and primer design
The potential polymorphic EST-SSR loci of P. subaequalis were identified from our two nonredundant unigenes datasets using CandiSSR [16]. The parameters used in CandiSSR were as follows: the flanking sequence length of 100, blast e-value cutoff of 1e-10, blast identity cutoff of 95, and blast coverage cutoff of 95. For each target EST-SSR, primers were automatically designed in the pipeline based on the Primer3 package [27] with default settings: PCR product size of 100 to 300 base pairs (bp), primer length of 18-25 bp, annealing temperature between 48 and 60˚C, and CG content from 40 to 60%. The OLIGO version 6.67 (Molecular Biology Insights, Inc., Cascade, Co, USA) was then used to check for hairpin structures, potential primer dimers and the occurrence of mismatch of the above designed primer pairs.

EST-SSR polymorphism validation and transferability test
Based on the proportion of different EST-SSR repeats, we randomly chose 54 candidate polymorphic primer pairs for initial tests of amplification availability and optimal annealing temperature of each primer pair using six samples (one individual per population) of P. subaequalis. The gradient PCR amplifications were performed on a GeneAmp9700 DNA Thermal Cycler (Perkin-Elmer, Waltham, Massachusetts, USA) following the standard protocol of the AmpliTaq Gold 360 Master PCR kit (Thermofisher Biotech Company, Applied Biosystems, Foster City, California, USA) in a final volume of 15 μL, which contained 1 μL (50 ng) of genomic DNA, 7.5 μL AmpliTaq Gold 360 Master Mix, 5.5 μL of deionized water, and 0.5 μL of forward and reverse primers (10 μM). The procedure of PCR was 5 min initial denaturation at 95˚C, 35 cycles of 45 s at 95˚C, a temperature gradient for annealing from 48˚C to 60˚C for 30 s and 30 s synthesis at 72˚C followed by a final 10-min extension step at 72˚C and a 4˚C holding temperature. The polymorphisms of the above successfully amplified loci were screened by means of fluorescence-based genotyping using 96 individuals from six natural populations of P. subaequalis. For all loci, the 5' end of each forward primer was tagged with one of four fluorescent dyes (FAM, ROX, HEX or TAMRA), and PCR amplifications were performed on a Gen-eAmp9700 DNA Thermal Cycler (Perkin-Elmer, Waltham, Massachusetts, USA) using reaction volumes of 15 μL including 1μL (50 ng) of template DNA, 7.5 μL AmpliTaq Gold 360 Master Mix, 5.5 μL of deionized water, and 0.5 μL of reverse and fluorophore-labeled forward primers (10 μM). PCRs were run following an endpoint PCR procedure with initial denaturation for 5 min at 95˚C followed by 35 cycles of 95˚C for 45 s, 30 s annealing at the optimal primer temperature ( Table 2) and 30 s synthesis at 72˚C; ending with a final 10-min extension step at 72˚C and a 4˚C holding temperature. PCR products were analyzed on an ABI 3730XL DNA Analyzer (Applied Biosystems, Foster City, California, USA) with GeneScan LIZ 500 as an internal reference (Applied Biosystems). Electrophoresis peaks scoring and polymorphism identification were assayed by using GeneMarker v2.2.0 (SoftGenetics, State College, Pennsylvania, USA). All primer sequences obtained from this study were submitted to GenBank (Table 2).

Evaluation of population genetic diversity and variation and test of bottleneck effect
The number of alleles (A), observed heterozygosity (H o ), expected heterozygosity (H e ) and polymorphism information content (PIC) were calculated for each locus and population using CERVUS v3.0 [28]. FSTAT v2.9.3 [29] were employed to estimate the following genetic diversity parameters of each locus and six natural populations of P. subaequalis: total genetic diversity for the species (H T ), genetic diversity within populations (H S ) and population genetic differentiation coefficients (F ST and G ST ). The frequency of null alleles and their bias on genetic diversity were evaluated based on the expectation maximization method implemented in FreeNA [30]. Deviation from Hardy-Weinberg equilibrium (HWE) for each population and linkage disequilibrium for each primer pair were tested using a Markov chain (dememorization 1,000, 100 batches, 1,000 iterations per batch) through GENEPOP v4.2 [31]. Analysis of molecular variance (AMOVA) was performed to partition the total genetic variance among and within populations using ARLEQUIN v3.11 [32]. The program BOTTLENECK v1.2.02 [33] was used to detect the population bottleneck effect (i.e. reductions in effective population size) over past or more recent time scales under three different models of microsatellite evolution (Infinite allele model, IAM; Stepwise mutation model, SMM; Two-phased model of mutation, TPM).

De novo assembly of Parrotia subaequalis transcriptome datasets and functional annotation of unigenes
Using Illumina high-throughput RNA sequencing technology, a total of 26,037,119 and 26,666,948 raw reads (of 125 bp length) were generated for P. subaequalis (TX) and P. subaequalis (SJD), respectively. After stringent quality inspection and data filtering, 25,448,383 and 26,066,749 high-quality clean reads were obtained for P. subaequalis (TX) with 97.94% Q20 bases (base quality greater than 20) and P. subaequalis (SJD) with 98.21% Q20 bases. The total length of the clean reads was 3.62 Gb for P. subaequalis (TX) and 3.91 Gb for P. subaequalis (SJD). The GC percentage of P. subaequalis (TX) and P. subaequalis (SJD) were 46.48% and 46.54% (Table 1). The two raw sequencing datasets were uploaded to the NCBI Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/Traces/sra; Biosample accession SAMN10502180 for P. subaequalis (TX) and SAMN10509852 for P. subaequalis (SJD)).
With the help of Trinity version 2.5, these above clean reads were de novo assembled into 117,794 transcripts with an average length of 674 bp and an N50 length of 1268 bp for P. subaequalis (TX) and 145,619 transcripts with an average length of 672 bp and an N50 length of 1245 bp for P. subaequalis (SJD) ( Table 1). Subsequently, using TGICL software, we gathered 69,135 unigenes with an average length of 890 bp and an N50 length of 1591 bp for P. subaequalis (TX) and 84,009 unigenes with an average length of 887 bp and an N50 length of 1602 bp for P. subaequalis (SJD) ( Table 1). Among the unigenes in P. subaequalis (TX), the length of 48,285 unigenes (69.84%) ranged from 300 to 1000 bp, while the other 20,850 unigenes (30.16%) were more than 1000 bp in length. Of the unigenes in P. subaequalis (SJD), 59,643 unigenes (70.00%) had a length range of 300 to1000 bp, and 24,366 unigenes (30.00%) had a length of more than 1000 bp. The length distributions of these two unigene datasets are shown in Fig 2. Sequence similarity searching was conducted using the BLAST algorithm specifying E values of less than 10 −5 for annotation of unigenes. We used the BLAST2GO program to annotate and analyze the function of unigenes in two individuals of P. subaequalis against the GO database. It comprehensively classifies the properties of genes into three categories: biological process, cellular components and molecular function. Based on sequence similarity, 21,004 unigenes (30.38%) in P. subaequalis (TX) and 23,624 unigenes (28.12%) in P. subaequalis (SJD) were classified into three main GO categories and 55 sub-categories (Fig 3 and S2 and S3 Tables). For the two individuals of P. subaequalis, the three largest sub-categories of biological process were "metabolic process", "cellular process" and "single-organism process"; Of the cellular components, "cell", "cell part" and "membrane" were the most highly represented terms. Among fourteen different molecular function categories, "catalytic activity" and "binding" were the two most matched classes (Fig 3 and S2 and S3 Tables).
Furthermore, KEGG analysis was used to help us focus on the biological pathways and functions of the gene products of P. subaequalis. The results showed that 28,306 unigenes (40.94%) in P. subaequalis (TX) and 32,322 unigenes (38.47%) in P. subaequalis (SJD) were grouped into 21 biological pathways that fell under six larger groups (cellular processes, environmental information processing, genetic information processing, human disease, metabolism and organismal systems) (Fig 4 and S4 and S5 Tables). Among these 21 pathways, "global and overview maps", "carbohydrate metabolism", "translation", "folding, sorting and degradation", "amino acid metabolism" and "signal transduction" were the major biological pathways in the two individuals of P. subaequalis (Fig 4 and S4 and S5 Tables).

Polymorphisms and transferability assessment of EST-SSR markers
Of 497 candidate polymorphic EST-SSRs, primer pairs were designed for 488 EST-SSR loci (98.19%; S7 Table). The remaining loci were inappropriate for primer modeling or the DNA flanking sequences of these loci were too short to design primer pairs. From the 488 primer pairs, based on the proportion of different EST-SSR repeats, we randomly chose 54 primer pairs (S7 Table) for initial testing using six individuals (one sample per population) of P. subaequalis to ensure the availability and optimal annealing temperature of these primer pairs. After excluding those that gave poor amplification or produced a complex pattern with multiple bands in an initial screening, 44 primer pairs were selected for further tests of polymorphism and transferability.
To validate the polymorphisms of these 44 EST-SSR loci, fluorescence-based genotyping was performed using 96 individuals from six natural populations of P. subaequalis. Finally, 27 polymorphic primer pairs were selected for transferability and further population genetic studies, and all of these EST-SSR sequences have been deposited in GenBank with the accession numbers from MK238352 to MK238378 (Table 2). All EST-SSR markers were successfully cross-amplified and exhibited polymorphisms in five congeneric Hamamelidaceae species except for one loci (PasE380) for Sycopsis sinensis and two loci (PasE188 and PasE380) for Hamamelis virginiana, showing a transferability rate of 92.59% (Table 3).

Characterization of EST-SSR markers and population genetic diversity and variation
As a result, these above 27 polymorphic EST-SSRs in total yielded 100 alleles with an average of 3.70 alleles and a range of 1 to 8 alleles per locus. The polymorphism information content per locus over all populations varied from 0.060 to 0.597, and the observed and expected The transcriptomes and novel EST-SSRs of Parrotia subaequalis  Table 4). And a high frequency of null alleles was detected in PasE188 and PasE425 (v>5%) for the 27 EST-SSR loci. No significant linkage disequilibrium was observed for any pair of loci. Three loci deviated significantly from HWE expectations (P < 0.001) in some populations (PasE20 in HBS; PasE180 in HBS and ZXC; PasE368 in HBS, TX and LWS), which might be due to the Wahlund effect of specific populations (Table 4).
At the level of the species, our results showed that total genetic diversity of P. subaequalis (H T ) was 0.393 and genetic diversity within populations (H S ) was 0.336 (S8 Table). Overall, F ST and G ST across the six natural populations of P. subaequalis were 0.171 and 0.147, representing a much higher genetic differentiation between populations. The AMOVA revealed that 16.74% of the total variation was attributed to differences among six populations and that 83.26% was contributed by differences within populations (P < 0.001; S9 Table), indicating the genetic variation of P. subaequalis mainly existed in individuals within populations. Note: ─ = amplification failed. a Voucher and locality information are provided in S1 Table. https://doi.org/10.1371/journal.pone.0215874.t003

Parrotia persica (N = 5)
The transcriptomes and novel EST-SSRs of Parrotia subaequalis Table 4 Besides, bottleneck analysis found only one population of P. subaequalis in Zhuxian Village of Anhui Province (ZXC) could have experienced the significant recent bottleneck under the three different models of microsatellite evolution (S10 Table).

Characterization of the Parrotia subaequalis transcriptome using nextgeneration sequencing technologies
In recent years, the use of next-generation sequencing (NGS) technologies have become increasingly prevalent because of its high-throughput genomic and transcriptomic data output for model or non-model organisms at reasonable prices and schedules [34][35][36]. In the present study, we characterized the transcriptomes of two individuals of P. subaequalis using RNAsequencing technology on the Illumina HiSeq 2500 platform for the first time. Raw data of these two transcriptomes are currently available to the public. Approximately five Gb of data length for each individual of P. subaequalis were generated and assembled into unigenes. As a result, the mean length of the unigenes of P. subaequalis (TX) was 890 bp and 887 bp for P. subaequalis (SJD), suggesting that the large number of reads with paired-end information and high sequencing depth produced much longer unigenes than reported in previous transcriptome studies of Neolitsea sericea (mean length 733 bp) [37], Sesamum indicum (mean length 629 bp) [38] and Pennisetum purpureum (mean length 586 bp) [39].
In terms of the annotation of unigenes, the results showed a large part of unigenes (62.17% in P. subaequalis (TX) and 55.24% in P. subaequalis (SJD)) had homologs in public databases like Nr, Nt, InterPro, KOG, KEGG, Swiss-Port and GO. These annotated unigenes could provide valuable information for future studies on P. subaequalis. A minority of the unigenes (37.83% in P. subaequalis (TX) and 44.76% in P. subaequalis (SJD)) failed to match any proteins in the above public databases, which may be attributable to the large amount of shortlength (< 500 nt) unigenes (Fig 2) or the limited publicly available genomic and transcriptomic information for P. subaequalis. Further explanations for the low hit possibility of short sequences were the lack of a characterized protein domain or the short query sequences [38], resulting in false-negative results. GO is a worldwide classification database for gene function; in our study, "metabolic process", "cellular process", "catalytic activity" and "binding" were the four most matched categories in two individuals of P. subaequalis (Fig 3 and S2 and S3 Tables). Additionally, KEGG analysis of the annotated unigenes showed that "global and overview maps", "carbohydrate metabolism", "translation", "folding, sorting and degradation", "amino acid metabolism" and "signal transduction" were the primary biological pathways in the two individuals of P. subaequalis (Fig 4 and S4 and S5 Tables). Overall, these findings here will greatly enrich the transcriptomic resources for further research on gene discovery, molecular mechanisms and biological pathways of P. subaequalis.

Mining and utilization of polymorphic EST-SSR markers in conservation genetics
Prior to our study, molecular marker studies of P. subaequalis were conducted with ISSR, chloroplast SSR and nuclear SSR [6,22], while no EST-SSR markers had been reported. EST-SSR markers are powerful molecular markers for analyzing population genetic diversity, cross transferability rate, molecular breeding and functions [40,41]. With the wide application of the NGS technologies, the increasing number of transcriptome sequences have provided abundant resources for EST-SSR applications for research and genetic improvements. In addition, a number of bioinformatics software have been developed for SSR mining, such as MISA [42] and SSR Primer [43]. However, to date, these tools have not integrated a computational solution for systematic assessment of SSR polymorphic status, resulting in poor efficiency of polymorphic SSR identification and time-consuming experiments. The newly developed pipeline, CandiSSR, could help users detect candidate polymorphic SSRs with high efficiency [16]. Therefore, in the present study, using CandiSSR, we successfully and efficiently mined 497 candidate polymorphic EST-SSR markers from the two comparative transcriptomic datasets. Then, 54 randomly chosen primer pairs were used for validation of the polymorphism, and 27 primer pairs (50%) were proven to be polymorphic among 96 individuals from the six natural populations. Such high success ratios indicated that this kind of molecular development method with the aid of CandiSSR was highly efficient and considerably successful.
Furthermore, using the 27 polymorphic EST-SSR markers, 100 alleles were found across the 96 individuals of P. subaequalis from six natural populations. The range of the number of alleles per locus was from 1 to 8 with a mean of 3.70 alleles, which was lower than the range from 2 to 14 and mean of 5.33 alleles in the gSSRs of P. subaequalis [6]. The average gene diversity (H e ) and PIC value of the 27 polymorphic EST-SSR markers were 0.394 and 0.343, representing a moderate level of gene polymorphism compared to the gSSRs (mean: H e = 0.558; PIC = 0.515) reported in Zhang et al. [6]. We observed a considerably higher level of transferability (92.59%) in five congeneric Hamamelidaceae species than the gSSR (66.67%) reported by Zhang et al. [6]. The much higher level of cross-transferability and the slightly lower degree of gene polymorphism of EST-SSRs than of gSSRs reflected the highly conserved character of the flanking sequences of EST-SSRs and the low mutation frequency of EST sequences. Additionally, our EST-SSR survey of six natural population of P. subaequalis revealed a relatively high level of genetic diversity (H T = 0.393; H S = 0.336; S8 Table) and a little higher genetic differentiation level (F ST = 0.171; S8 Table) at the level of species, suggesting P. subaequalis maintained high levels of species diversity in the long-term evolutionary history despite its restricted and highly disjunct distribution range. The observation of genetic diversity and bottleneck test among six wild P. subaequalis populations indicated that WFS was the most variable population, while SJD and ZXC was the two more endangered populations that we should pay more attention to their protection and preservation. The Wangfo Mountain (WFS) was considered as one of the biodiversity refugia since the Tertiary in China [50,51] and few human activities were found there, thus contributing the highest genetic diversity in WFS population in some degree. While based on our field observations, SJD population was located in the scenic area of Shanjuan Cave and the population ZXC lied in a village, the human interference including farming and foresting may result in the lower level of genetic diversity and recent bottleneck. In summary, the polymorphic EST-SSR markers developed here will provide a powerful tool for further studies on conservation genetics and molecular breeding of P. subaequalis and other Hamamelidaceae species.