Phylogeographic and phylogenetic analysis for Tripterygium species delimitation

Abstract Tripterygium wilfordii (Celastraceae) is a traditional Chinese medicine; and the dried root and rhizome constitute the main officinal parts. Tripterygium wilfordii has been identified as a potential candidate for the treatment of systemic lupus erythematosus, rheumatoid arthritis, nephritis, asthma, leprosy, and cancer. The phylogenetic relationships within the Tripterygium genus are ambiguous; thus, our aim is to clarify the relationships within this genus using phylogeographic and phylogenetic analyses. Here, we first sequenced three plastid DNA regions (i.e., psbA‐trnH, rpl32‐trnL, and trnL‐trnF) and found that Tripterygium hypoglaucum and T. wilfordii were clustered together based on the strength of the topology in the phylogenetic analysis: T. hypoglaucum is polyphyletic, and T. wilfordii is paraphyletic. A spatial analysis of molecular variance showed that the best group value is 4, and the groups were almost consistent with the topology of in the phylogenetic analysis. The Mantel analyses of Tripterygium using IBD web showed statistically significant relationships between genetic and geographical distance distributions (r = .3479, p < .0001). The molecular dating using Fossil calibration indicated that the divergence in Tripterygium was approximately 8.13 Ma. Furthermore, we also analyzed four DNA regions (i.e., ITS2, psbA‐trnH, matK, and rbcL) that were obtained from the NCBI nucleotide database; these results showed that T. wilfordii and T. hypoglaucum clustered together, while Tripterygium regelii represented a separate cluster. Tripterygium hypoglaucum and T. wilfordii were never distinct lineages, and the species circumscriptions are artificial. We propose that T. wilfordii and T. hypoglaucum are conspecific, while T. regelii likely constitutes a separate species.


| INTRODUCTION
Tripterygium wilfordii, which is a perennial woody vine member of the Celastraceae family that is native to eastern and southern China, Taiwan, Korea, and Japan, has been used as a traditional and allopathic Chinese herb for centuries (Brinker, Ma, Lipsky, & Raskin, 2007).
Triptolide is a primary diterpene triepoxide that has been shown to have multiple important pharmacological effects, such as anti-inflammatory activity and the inhibition of lymphocyte proliferation; Triptolide has also been used for the management of autoimmune diseases (Liu, 2011). Certain key enzyme genes in terpenoid biosynthesis in T. wilfordii Hook. f. such as 1-deoxyd-xylulose-5-phosphate synthase (DXS), 1-deoxyd-xylulose-5-phosphate reductoisomerase (DXR), farnesylpyrophosphate synthase, and 3-hydroxy-3-methylglutaryl-CoA synthase (HMGS) genes have been have been well studied Tong et al., 2015;Zhao et al., 2015). Minnelide, which is a water-soluble triptolide analog that is extracted from Tripterygium, has been shown to be highly effective in reducing pancreatic tumor growth and spread in mouse models (Chugh et al., 2012) and in suppressing small cell lung carcinoma tumor growth in animal models (Kumar, Corey, Scott, Shiva, & D'Cunha, 2016;Rousalova et al., 2013). (5R)-5-Hydroxytriptolide (LLDT-8) is a new, optimized triptolide analog that shows lower cytotoxicity and relatively higher immunosuppressive activity (Shen et al., 2015). Moreover, triptonide can effectively inhibit canonical Wnt/  (Hance, 1880;Hutchinson, 1917;Ma, Brach, & Liu, 1999;Loesener & Theodor, 1913;Ohwi, 1932;Sprague & Takeda, 1912). The traditional morphological characteristics that are often used to separate the species within a genus are highly variable and cannot be used to separate these species; according to these characteristic, this genus comprises only one variable species (Ma et al., 1999). However, in the Germplasm Resource Information Network (https://www.arsgrin.gov/), Tripterygium contains the following four species: T. wilfordii Hook. f, Tripterygium f in the subsequent publication "Flora of China." Unifying the species is often practical when addressing the problem of species delimitation (Wiens, 2007); however, incorrect species delimitation may lead to serious problems in further studies, and could result in a waste of money and effort.
Morphological data are responsible for most of our knowledge regarding the phylogeny of life (Scotland, Olmstead, & Bennett, 2003).
However, there are limitations to these type of data, such as phenotypic plasticity, genetic variability (de Boer, Ichim, & Newmaster, 2015), and discrimination among cryptic species (Packer, Gibbs, Sheffield, & Hanner, 2009). Recently, DNA sequences have played an important role in species identification. The method used to identify organisms based on their DNA sequences has been coined DNA barcoding by Hebert, Cywinska, Ball, and Waard (2003) and includes the mitochondrial gene cytochrome c oxidase I gene as a standard barcode that is applied to all animals. Although many researchers have attempted to establish a universal plant barcode, none of the available sequences could be used for all species. The Barcode of Life-Plant Working Group has shown that the matK + rbcL pair is a useful plant barcode, with a discriminatory efficiency of 72 % (Group, 2009).
A quantitative analysis of 19 compounds has shown that there is a significant difference in the compound content between T. wilfordii and T. hypoglaucum (Guo, Duan, Liu, Liu, & Li, 2014). However, a content analysis of triptolide in different populations and individuals of Tripterygium suggested that triptolide in T. regelii was very low, and there was no significant difference between T. wilfordii and T. hypoglaucum (Huang, Guo, & Si, 2005). Furthermore, Chen et al. (2017) have shown that there was a significant difference (p < .0001) among the contents of 11 chemical components in T. hypoglaucum, T. wilfordii, and T. regelii. Molecular analyses using ITS and 5S rDNA sequences have shown that T. hypoglaucum is not distinct from T. wilfordii, while T. regelii can be recognized as an independent species (Law et al., 2011). Phylogenetic analyses of the Celastrus genus have shown that, based on the sequences of two nuclear (ETS and ITS) and three plastid (psbA-trnH, rpl16 and trnL-F) sequences, Celastrus and Tripterygium (T. wilfordii and T. regelii) form a maximally supported clade (Mu, Zhao, & Zhang, 2012). In addition, an RAPD marker (Liu, Guo, Huang, & Si, 2007) experiment was conducted to study the genetic relationships and diversity in the Tripterygium genus. These experiments showed that five T. regelii individuals gathered closely to form a single branch that was separated from the other populations, and that T. wilfordii and T. hypoglaucum could be combined into the same species. Further analyses using four DNA regions (i.e., ITS2, matK, rbcL, and psbA-trnH) for species identification in Tripterygium showed that T. regelii clustered alone, whereas T. hypoglaucum clustered with T. wilfordii, suggesting that T. hypoglaucum and T. wilfordii are potentially conspecific or that the limited barcode did not provide enough variation for their differentiation (Zhang et al., 2016).
Here, our purpose is to clarify the species delimitation in the Tripterygium genus. We sequenced and combined three plastid DNA regions (i.e., psbA-trnH, rpl32-trnL, and trnL-trnF) as the first matrix, which was used in the population genetic analysis, phylogeographic analysis, phylogenetic analysis, and divergence time estimates. The other four DNA sequences (i.e., ITS2, psbA-trnH, matK, and rbcL) were downloaded from the National Center for Biotechnology Information (NCBI) nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore) and combined as the second matrix, which was used in the phylogenetic analysis, including Bayesian inference (BI), maximum likelihood (ML), and haplotype network construction. Plastid DNA fragments are usually inherited maternally in angiosperms; however, the nuclear rDNA internal transcribed spacer is derived from both parents (McCauley, Sundby, Bailey, & Welch, 2007). The results show that T. wilfordii and T. hypoglaucum could be considered the same species, which is distinct from T. regelii. A spatial analysis of molecular variance (SAMOVA) showed that the best group value in Tripterygium is 4, the group from the southwest contains all specimens from T. wilfordii and some specimens from T. hypoglaucum, and all specimens of T. regelii converge into northeast group alone. The NST value (0.854) was significantly (p < .05) greater than the GST (0.561), suggesting that there was a significant phylogeographic structure in Tripterygium. The Mantel analyses at the genus level indicated statistically significant relationships in the combined cpDNA sequences (r = .3479, p < .0001) between the genetic and geographical distance distributions. The molecular dating using Fossil calibration estimated that the divergence in Tripterygium was approximately 8.13 Ma.

| Sample preparation
Leaf samples were collected from three species in the Tripterygium genus and the species Celastrus orbiculatus. First, we collected T. wilfordii seedlings from different places in southern China and planted the seedings in Datian (Fujian Province in China) for further applications. In all cases, fresh leaves were plucked for the DNA extraction and stored at −80°C in the laboratory. Finally, 120 samples (Table S1) were collected as follows: 10 T. hypoglaucum samples, 10 T. regelii samples, 94 T. wilfordii samples, and 6 C. orbiculatus samples. The ITS2, psbA-trnH, rbcL, and matK intergenic region sequences in Tripterygium and C. orbiculatus were downloaded from the NCBI nucleotide database (Table S2).
The PCR products were detected by electrophoresis on a 1% agarose gel containing GeneGreen Nucleic acid dye in TAE buffer; the fragments were visualized using the Vilber Lourmat imaging system. All PCR products were sequenced in both directions using an ABI 3730XL automated sequencer (Applied Biosystems) at the Beijing RUIBO Biotech Company.

| Sequences
The forward and reverse trace files that resulted from the sequencing were trimmed, visually inspected and manually adjusted using SeqMan version 7.1.0 (Burland, 2000). We removed the poor quality bases that were located at the 5′ and 3′ ends of the three types of PCR products. The individual alignments of the three cpDNA regions were then combined to produce a three-gene alignment for all 120 samples using Editseq version 7.1.0 (DNASTAR, Madison, WI). A multiple sequence alignment was performed using MEGA version 7.0.18 (Kumar, Stecher, & Tamura, 2016). Haplotypes whose sites with gaps or missing were considered, invariable sites were included, and parsimony informative sites and invariable sites were analyzed using DNASP version 5 (Librado & Rozas, 2009). We also calculated haplotype diversity (H d ) in Tripterygium using DNASP version 5 (Librado & Rozas, 2009). For the haplotype data set, the net between-group mean distance that was used in the p-distance model was calculated using MEGA version 7.0.18 (Kumar, Stecher, et al., 2016). In addition, Network version 5.0.0.0 (Fluxus Technology Ltd. 1999-2016) was used to construct a haplotype network based on the Median Joining (MJ) algorithm (Bandelt, Forster, & Röhl, 1999).

| Population genetic and phylogeographic analyses of psbA-trnH + rpl32-trnL + trnL-trnF
The average within-population diversity (HS), total gene diversity (HT), geographical total haplotype diversity (VT), geographical average haplotype diversity (VS), and differentiation among the populations (GST and NST) in Tripterygium were estimated based on the method proposed by Petit, 1995, 1996) using PermutCpSSR (http://www.pierroton.inra.fr/genetics/labo/ Software). GST was dependent only on the haplotype frequencies, while NST also depends on the haplotype frequencies between the haplotypes. To identify possible population groups that are geographically homogeneous and maximally differentiated in 40 different locations in Tripterygium, a spatial analysis of the molecular variance (SAMOVA) was carried out using SAMOVA 2.0 (Dupanloup, Schneider, & Excoffier, 2002). The molecular distance was set to the pairwise difference, and the number of initial configuration groups was set to 100 with 1,000 simulated annealing steps. To select the most appropriate number of groups, the K-values ranged from 2 to 35.
While the rate of FCT change began to decline between successive Kvalues, the best grouping of populations was determined (Blair et al., 2013). Analyses of molecular variance (AMOVA) (Excoffier, Smouse, & Quattro, 1992) about genetic structure of the populations were performed in order to analysis value of variation within populations (FST), among populations within groups (FSC), and among groups (FCT).
We estimated the associations between a genetic matrix based on the group average distance, which was calculated using MEGA7.0 with 1,000 bootstrap and a p-distance model about of 40 locations in Tripterygium, and a matrix of the geographical distance (km) based on Mantel's (1967) method using IBDWS (http://ibdws.sdsu.edu/∼ibdws/distances.html) (Jensen, Bohonak, & Kelley, 2005) with 10,000 random permutations.

| Phylogenetic analysis and divergence time estimates
The BI and ML methods were used for constructing phylogenetic trees based on haplotype data sets of the combined cpDNA sequences. For the BI analysis, the model selection was based on the Akaike information criterion (AIC) using the program MrModelTest version 2.3 (Nylander, 2004). BI was analyzed using MrBayes version 3.2.6 (Huelsenbeck & Ronquist, 2001;Ronquist & Huelsenbeck, 2003;Ronquist et al., 2012) with C. orbiculatus as the outgroup. Two independent Bayesian Markov chain Monte Carlo (MCMC) analyses were conducted simultaneously. In these analysis, we used 2 × 10 6 generations, sampled every 100 generations, and 25% (=5,000) of the trees were discarded as burn in with four MCMC chains using three heated chains and a cold chain. The verified convergence was shown in MrBayes and the average standard deviation of the split frequencies below 0.05. We set the Temp parameter as 0.05 so that we could control the heating coefficient. For the ML analysis, we implemented an automatic model selection (Lefort, Longueville, & Gascuel, 2017) with AIC using a Subtree Pruning and Regrafting ML heuristic with 1,000 bootstrap replications using PhyML version 3.0 (Guindon et al., 2010). The phylogenetic analysis of all psbA-trnH + rpl32-trnL + trnL-trnF sequence combinations and the ITS2 + psbA-trnH + matK + rbcL sequence combinations is shown in Supplementary 3.
We estimated the time of the most recent common ancestor (TMRCA) of the haplotypes (psbA-trnH + rpl32-trnL + trnL-trnF) using Beast version 2.3 (Drummond, Suchard, Xie, & Rambaut, 2012). The best substitution model was similar to that in the BI phylogenetic analysis.

| Specimens and sequence analysis
One hundred and twenty samples were collected from 12 provinces in China (the geographical distribution is shown in Figure 1

| Haplotype distribution and network
The haplotype distribution was surveyed across the 120 individuals sequenced in this study. Eighteen haplotypes (cHap1-cHap18) resulted from the combined cpDNA matrix (Table 2). Fifteen haplotypes (zHap1-zHap15) resulted from the combined DNA (ITS2, psbA-trnH, matK, and rbcL) matrix (Table 3). The haplotype networks are shown in Figure 2. These haplotype networks revealed a direct connection between T. wilfordii and T. hypoglaucum, whereas T. regelii constitutes a distinct isolate.

| Phylogenetic analysis and divergence time estimates
The BI haplotype phylogenetic trees were based on the psbA-trnH + rpl32-trnL + trnL-trnF regions and the combination of the ITS2 + psbA-trnH + matK + rbcL regions and are presented in Figure 3.

| DISCUSSION
Every species spans a certain geographical range, and the ecological and evolutionary processes that limit the geographical range may be crucial for creating new species, particularly allopatric species (Wiens, 2007). Tripterygium wilfordii Hook. f. is native to Taiwan, High levels of genetic diversity (HT = 0.981) were detected in the Tripterygium combined cpDNA using PermutCpSSR. Narrowly distributed species may have lower levels of genetic diversity than widespread species (Cole, 2003;Hamrick & Godt, 1996). The wide distribution of Tripterygium in south and northeast China potentially lead to a high level of genetic diversity. We found that the sig- China has a long history and a rich culture of herbal medicines, and medicinal plants account for 11,146 species from 2,309 genera and 383 families (Chen et al., 2010). In the herbal products market, herbal medicinal materials are often substituted by herbs from related species  (Ming, Cao, But, & Shaw, 2011). DNA barcoding may, thus, provide a feasible and cost-effective tool for species authentication and the monitoring of the herbal products. DNA barcoding can potentially be used to determine the identity of plant the species used in herbal medicines that may cause adverse drug reactions, although none of the available loci apply to all species (Li et al., 2015). To conduct the phylogeographic and phylogenetic analyses for the Tripterygium species delimitation, we combined the following three cpDNA regions: psbA-trnH, rpl32-trnL, and trnL-trnF.
Currently, psbA-trnH is widely used for DNA barcoding due to its highly conserved coding sequence (which facilitates primer design) (Shaw et al., 2005); psbA-trnH can be easily amplified across a broad range of nearly all angiosperms (Shaw et al., 2007). The psbA-trnH cpDNA intergenic region exhibits a large number of insertions/deletions (Kress & Erickson, 2007). The rpl32-trnL intergenic spacer, which is located in a small single-copy region of the chloroplast genome, has also been used in sequence-based phylogenetic analyses of Cistus creticus (Falchi et al., 2009), the Antirrhineae family (Yousefi, Zarre, & Heubl, 2016), the Ombrocharis genus (Chen et al., 2016), and the American Vernonieae tribe (Loeuille, Keeley, & Pirani, 2015). TrnL-trnF is also a chloroplast spacer that has been frequently used for studying phylogenetic relationships and species identification (Shaw et al., 2007).