Tertiary Origin and Pleistocene Diversification of Dragon Blood Tree (dracaena Cambodiana-asparagaceae) Populations in the Asian Tropical Forests

Background: The origin of extraordinarily rich biodiversity in tropical forests is often attributed to evolution under stable climatic conditions over a long period or to climatic fluctuations during the recent Quaternary period. Here, we test these two hypotheses using Dracaena cambodiana, a plant species distributed in paleotropical forests.


Introduction
Understanding the origin of extraordinary biological diversity in tropical forests remains as one of the greatest challenges in evolutionary biology. In general, two contrasting hypotheses have been put forward to explain the origin of high biological diversity in tropical ecosystems. The persistence hypothesis stipulates that the gradual evolution during the long-term survival of organisms and low extinction rates under stable climatic conditions may have lead to the present day high biological diversity in the tropics. Alternatively, the refugia hypothesis suggests that the current diversity in the tropics could be attributable to successive isolation and subsequent expansion of populations in response to frequent oscillations of the climate during the Quaternary period [1,2,3,4,5,6]. Thus, the persistence hypothesis suggests that the origin of high biodiversity in the tropics should date back beyond the Quaternary period, whereas the refugia hypothesis stipulates that the biological diversification in the tropics is more recent and coincides with the Quaternary climatic fluctuations. Numerous studies suggest that the high biological diversity in the tropics may have originated during the pre-Quaternary era [1,2,4,6,7]. However, several examples, particularly from the Neotropics, support the refugia hypothesis suggesting a recent diversification during the Quaternary period [1,2,7,8]. In contrast to the Neotropics, the evolution of biological diversity in the Asian Paleotropics may have been influenced by major geological events including the collision of the Deccan plate with Eurasia and associated topographical alterations during the Tertiary [6,9,10,11,12,13]. Our present understanding of the post-glacial time evolution of plants and animals is mostly based on studies focusing on temperate and sub-tropical organisms, and limited data are available on the post-glacial evolution in paleotropical ecosystems.
The limited understanding of the evolution of tropical plants during the Quaternary period could be attributable to paucity of reliable data from the tropical regions. The fossil records of vascular plants in the tropics representing the Quaternary period are rare [14] and pollen data may not accurately identify the responses of plants to changes in the climate during the Quaternary period [15,16]. As a means of overcoming these limitations, DNA based molecular genetic approaches have become an integral tool in understanding the genetic diversification and historical plant demographics. DNA-based genomic data of extant species are being used to infer genetic diversification and biogeographical history over the past several million years and provide a basis for inferring historical population demographics [17]. In particular, molecular dating based on chloroplast DNA (cpDNA) sequences and nuclear microsatellites provides a unique opportunity to explore the historical demography and chronology of evolutionary changes [15,16,18,19,20]. In Angiosperms, cpDNA is generally maternally inherited and dispersed through seeds, and therefore cpDNA markers provide a means to study seed-mediated species migration patterns [14,21]. The biparentally nuclear microsatellite (nSSR) data provide a means to investigate fine-scale genetic structure attributable to the combined effects of both seed and pollen flow.
Dracaena cambodiana Pierre ex Gagnepain (Asperagaceae), commonly known as the dragon blood tree is a vulnerable species and generally distributed in the escarpments of island-like limestone mountains in paleotropical Southeastern Asia. Current dragon trees mainly distributed in Asia and Africa [22,23]. D. cambodiana was the representative of dragon trees in Asia. Dispersal of tropical elements between Asia and Africa has two possible ways [6]. One was megathermal rainforest expansion during warm Paleocene and Miocene. The other one was Indian plate collision to Eurasian land during Miocene. Whatever the ways for D. cambodiana, this species should exist in Asian Paleotropics during Tertiary. Thus, D. cambodiana serves as an ideal model species to test the biodiversity-original hypotheses of plant species in Paleotropics. The objectives of this study are (i) to illustrate the genetic diversity, gene flow and past population demographics using cpDNA and nSSR; (ii) to determine the ancestral populations and past migration patterns through phylogeographic analysis of cpDNA; (iii) to date the genetic divergence under different molecular clock methods.

Ethics Statement
Dracaena cambodiana is national second class protected plants in China. This species is listed in the Inventory of Rare and Endangered Plants of China and the Key Protected Inventory of Wild Plants of China (http://db.kib.ac.cn/eflora/View/plant/ ZXBWSpecies.aspx), but it is not evaluated by the International Union for Conservation of Nature (IUCN) and it is also not recognized as the endangered or protected species in Thailand, Cambodia, Laos and Vietnam. Furthermore, the locations in Thailand, Cambodia, Laos and Vietnam are not privately-owned or protected in any way except the Pumat National Park of Vietnam. We got the permission of the Wildlife Protection and Administration Office under the Forestry Department of Yunnan, Guangxi and Hainan Province in China and the permission of the Administration of Pumat National Park of Vietnam. We also got the permission of local forestry department in other locations of Thailand, Cambodia, Laos and Vietnam. The sampling process was under the guidance of local rangers. This plant was solely used for scientific research and our sampling will not affect the regular growth of D. cambodiana.

Sample Collection and DNA Extraction
Leaf samples of D. cambodiana plants from 15 localities representing four geographical regions covering most of the natural distribution range of the species in Asia (Figure 1a, Figure 1d, Table 1) were collected, and dried using silica gel. The total genomic DNA was extracted from about 100 mg of silica-geldried leaf materials following a CTAB based DNA extraction protocol [24].

Molecular Markers
Chloroplast DNA (cpDNA) sequences. Ten individuals representing different geographic regions were surveyed for the nucleotide sequence variation at four cpDNA regions (trnL-trnF, psbA-trnH, atpB-rbcL, and trnD-trnT). Based upon sequence variability, atpB-rbcL [25] and trnD-trnT [26] regions were chosen for further analysis. These regions were PCR amplified, purified and sequenced (see online supplement for details). The DNA sequences were aligned using CLUSTALW [27] as implemented in MEGA V5 [28] and manually edited to improve the alignment. Insertions and deletions (indels) in the repeated regions were excluded from the analyses as these regions are known to evolve rapidly and prone to homoplasy [29,30]. A total of 140 individuals were sequenced and representative cpDNA haplotype sequences of D. cambodiana and outgroups were deposited in the GenBank (accessions: JF784389-JF784419).

Genetic Analyses
Diversity parameters. The sampled D. cambodiana populations were clustered into four groups based upon their geographical distribution (Figure 1).
For cpDNA data, the haplotype diversity (h) and nucleotide diversity (p) were calculated for each group and overall level using the program DNASP V5.10 [32]. The total genetic diversity (H T ) and within population genetic diversity (H S ) were calculated using PERMUTCPSSR V2.0 [33]. The genetic differentiation (F ST-cp ) among all fifteen populations and among four groups was calculated using ARLEQUIN V3.1 [34].
For microsatellite data, the presence of null alleles, large allele dropout and scoring errors due to stuttering were tested using MICRO-CHECKER V2.2.3 [35] and anomalies were rectified following Brookfield [36]. The heterozygosity (observed H O and expected H E ) and test for linkage-disequilibrium were calculated using GENEPOP V3.3 [37]. The deviations from Hardy-Weinberg equilibrium values were tested using FSTAT V2.9.3.2 [38] by calculating F IS values for each population and each locus. The statistical significance of deviations was assessed by randomization for 1000 times. The proportion of genetic variation partitioned among populations and among groups of populations were quantified using analyses of molecular variance (AMOVA) as implemented in ARLEQUIN and the statistical significance was tested with 10000 permutations.  [39]. Using nuclear microsatellite data, we assessed the genetic structure based on two differentiation indices namely F ST-SSR [40] and R ST [41]. The F ST-SSR value is based on infinite allele model and R ST estimate is based on the stepwise mutation model. The significant difference between these two estimates is an indication of the existence of phylogeographic structure. The genetic structure was inferred using SPAGEDI software [42]. We tested for the presence of isolation-by-distance (IBD) by regression analysis between genetic differentiation F ST /(1-F ST ) and the natural logarithm of geographic distance for all pairs of populations [43]. The statistical significance of IBD was tested using the Mantel test with 1000 permutations as implemented in NTSYS V2.10e [44]. We tested the phylogeographic signal using nuclear microsatellite data following Hardy et al. [45] as implemented in SPAGEDI. This approach is based on comparison between F ST-SSR and R ST which are expected to be equal under the null hypothesis of no phylogeographic signal. G ' ST of nSSR was estimated by SMOGD [46].
Detection of genetic clusters. We used nuclear microsatellite data to test for the existence of cryptic population structure employing the Bayesian model based software program STRUC-TURE V2.3.3 [47]. An admixture model with correlated allele frequencies was used for estimating the historical population admixture and number of genetic clusters (k) ranging from 1 to 20 [48,49]. All analyses employing the STRUCTURE program were performed with 20 replicates, each with a burnin period of 20 000 and a Markov chain Monte Carlo (MCMC) value set at 130000. These values were large enough to stabilize log (alpha), Ln likelihood and the MCMC chain converging to consistent end results [47]. The method of Evanno et al. [50] was used for finding the most likely value of k by plotting the log probability of L(k) and the Dk of the data over multiple runs using the program STRUCTURE HARVESTER [51]. In order to compare runs with the same value of k, we calculated the symmetric similarity coefficients (SSC) using the Greedy algorithm as implemented in the CLUMPP software [52]. The groups of runs with SSC $0.8 were combined and bar plots were prepared using the software program DISTRUCT [53]. The GenAlex program [54] was used for calculating pairwise genetic distances between individuals [55] and principal coordinate analysis (PCA).
Haplotype network analyses. The evolutionary network reconstruction and ancestral haplotype inference based on cpDNA data were carried out using the software program NETWORK V4.5.1.6 (www.fluxus-engineering.com) following the median-Joining method [56]. The ambiguous loops were resolved following the method of Crandall & Templeton [57] assuming that rare haplotypes and widespread haplotypes are more likely located at the tip and interior respectively and singletons are more likely derived from the same population. We chose Agava sp., Yucca gloriosa and Asparagus setacea as outgroups for the evolutionary network analysis. Phylogenetically Asparagus is basal to Agava and Yucca, which are closely related to Dracaena [58]. Thus we used Asparagus as the basal outgroup to root the evolutionary network.
Molecular dating and demographics of cpDNA. We used the 'Isolation with migration' (IM) coalescent model as imple-mented in the program IMa2 [59,60] to estimate the divergence times of four geographically distinct groups of populations. We calculated the divergence time, t i (i as the node of diverging groups) and the time since most recent common ancestor (TMRCA) of the species and each group (t TMRCA ). The resulting values were converted to absolute time scale, T in years using the formula T = t/mk, where m is the number of substitution per site per year (s/ s/y) and k is the length of the sequence [61]. We estimated m as 0.14647660.063280610 29 s/s/y based on uncorrelated exponential clock model (Table S1 and File S1), and the value of k is 1814 bp. The newick format of a tree for IMa2 based on pairwise F ST values among four groups (Table S2) was constructed using NTYSYS software package. The haplotype in Southern Indochina (S-I), which was found to be ancestral based upon haplotype network analysis was used for rooting the haplotype network.
The departure from population demographic equilibrium was assessed using Tajima's D [62] and Fu's F S [63] statistics using the ARLEQUIN software. The past population dynamics and the estimates of most recent common ancestor or T TMRCA were further analyzed through Bayesian skyline plot [64] as implemented in the program BEAST V1.5.4 [65]. The Bayesian skyline reconstruction and lineages through time analysis were conducted using TRACER V1.5 [66] with burn-in of 10% of chains. The details of IMa2 and BEAST analysis procedure are given in File S1. The evidence for population demographic growth was investigated through mismatch distribution analysis using the software program ARLEQUIN. The pair-wise mismatch distribution is expected to be multimodal in samples drawn from populations at demographic equilibrium and unimodal in populations that have undergone a recent demographic expansion The shared haplotypes are indicated in bold-italic.
(2) = no individuals were available for nuclear microsatellite analysis. doi:10.1371/journal.pone.0060102.t001 [67,68] or a range expansion with high levels of migration between neighboring populations [69]. The goodness-of-fit of mismatch distributions to the expected distribution under a sudden expansion model [68] was tested using the sum of squared deviations (SSD) and the raggedness index [HRag ; 70]. Population bottleneck analyses of nSSR. The excess in heterozygosity (H e ) as compared to the expected heterozygosity under mutation drift equilibrium (H eq ) for a given allelic diversity is an indication of recent reduction in the effective population size [71,72]. We used the software program BOTTLENECK [73] to assess any significant excess in heterozygosity (H e .H eq ) through Wilcoxon sign-rank test and sign test with 5000 replications. We performed the analysis under Infinite allele model (IAM) and twophase model of microsatellite evolution with the proportion of single step stepwise mutation (SMM) set to 70% and the variance to 30%. The population bottleneck is also expected to change the allele frequency distribution [74]. The allele frequency mode-shift indicator test was performed to test for significant departure in allele frequency distribution as compared to equilibrium expectations.
The population divergence under genetic drift alone or the balance between gene flow and genetic drift was assessed using the software program 2MOD V0.2 [http://www.rubic.rdg.ac.uk/ mab/software.html]. The genetic drift model assumes that fragmented populations are diverging by drift alone, and the gene flow-drift model assumes population allele frequencies are determined by the balance between genetic drift and gene flow. Two independent runs of 500000 iterations were performed and the initial 10% of the iterations were used as the burn-in. The Bayesian factors were calculated using TRACER to test the relative contribution of drift and gene flow to the population divergence.

Genetic Diversity
The length of aligned consensus sequences of cpDNA fragments, atpB-rbcL and trnD-trnT of 140 individuals representing 15 D. cambodiana populations were 858 bp and 990 bp respectively. The atpB-rbcL region included two 1bp indels and the trnD-trnT region included one 8 bp (only in one individual of the population TL), one 17 bp, one 5 bp and two 1bp indels. After excluding gaps, the length of combined cpDNA consensus sequence length was 1814 bp, which included 15 polymorphic sites and 18 haplotypes (Table 1; Figure 1).
Overall, only 4 (H1, H5 H8, H16) of the 18 haplotypes were shared among populations (Table 1; Figure 1). Only one haplotype (H5) was shared among three populations (HF, LA, SY) in different groups (NE-I, N-I, HN). The other three haplotypes were shared between two or three populations within groups (H1: NX, ML, JG; H8: JC, HF; H16: TL, BA). The remaining 14 haplotypes were fixed within each population. The haplotype diversity and nucleotide diversity of cpDNA were 0.911 and 1.81610 23 respectively ( Table 2). The total genetic diversity was 0.968. The genetic diversity within population was 0.198. Observed heterozygosity (H O ) estimated from nSSR was from 0.474 to 0.706 (average was 0.637). Expected heterozygosity (H E ) estimated from nSSR was from 0.747 to 0.919 (average was 0.948).

Population Differentiation and Phylogeographic Structure
The hierarchical AMOVA based upon both cpDNA and microsatellites data ( Table 2 and Table 3) revealed significant species level genetic structure (F ST-cp = 0.850, P,0.05; F ST-SSR = 0.329, P,0.01) and within-group population divergence (F SC-cp = 0.833, P,0.01; F SC-SSR = 0.322, P,0.01). Although cpDNA data revealed a significant genetic differentiation among four groups (F CT-cp = 0.104, P,0.01), microsatellite data showed no significant genetic differentiation among groups (F CT-SSR = 0.001, P.0.05). The gene flow (Nm) among populations based on cpDNA and microsatellite data was 0.0882 and 0.5099 respectively. The range-wide isolation by distance (IBD) in D. cambodiana was statistically non-significant (r cp = 0.027 P = 0.437; r SSR = 0.199 P = 0.084). A significant phylogeographic structure based on both cpDNA and microsatellites data was detected  Table 2). G ' ST of cpDNA was approached to 1 and G ' ST of SSR was 0.921.

Genetic Clusters
In the STRUCTURE analysis, the posterior probability (LnP(D)) gradually increased reaching the highest value corresponding to K = 9 (Figure 2A). Similarly, the DK value also showed the highest peak at K = 9 ( Figure 2A). Although few peaks of LnP(D) at K values of 12, 15 and18 were present, we chose K = 9 as the optimal number of genetic clusters. The value of K = 9 corresponded to the highest peak of LnP(D) at the lowest K value and the highest DK value, the criteria for choosing the optimal K value as recommended in the documentation of STRCTURE software (http://pritch.bsd.uchicago.edu/software). The occurrence of nine genetic clusters (K = 9), a value higher than the geographically defined four sampling regions suggests a high genetic differentiation among populations than among regions. The bar-plot depicting ancestry of samples ( Figure 2B) further indicated that the populations from Northeastern Indochina were more or less genetically similar to Northern Indochina and represents genetically diverse group. The populations in the Hainan Island and Southern Indochina showed complex genetic admixture with membership in more than one cluster. The population in the Hainan Island represents a genetically distinct group as compared to other populations analyzed in the present study.

Haplotype Network Analysis
The rooted phylogenetic network analysis of 18 cpDNA haplotypes (Figure 1b and Figure S2) showed that only haplotypes from the S-I group connected to the outgroups. The haplotype H14 showed most connections with other haplotypes suggesting haplotypes in S-I are derived from haplotype H14. This suggests that the S-I group is ancestral [75]. The haplotype networks of other three groups are complex and formed several loops. The connections among H3, H5, and H8 were resolved following the methods of [57]. With the exception of S-I group, the phylogenetic clustering of other three groups was not congruent with their geographic locations. The relationships among haplotypes H1, H4, H9 and H10 were clearly interpretable based on their geographic location after resolving the loops following the minimum evolution theory. Most haplotypes in the Northeastern group (NE-I) have derived from H11 and one haplotype, (H5) has derived from H14. Haplotypes in the Northern Indochina group (N-I) have derived from three ancestral haplotypes: H1 from H5, H2 from H11and H4 from H14.

Molecular Dating and Historical Demography
The coalescent times of the most recent common ancestor (T TMRCA ) based on the results of IMa2 and BEAST program were 11.48 (5.46-22.39) million years ago (Ma) and 9.77 (3.69-18.64) Ma respectively (Figure 3). The T TMRCA of the 18 cpDNA haplotyes was 15.80 (5.29-29.88) Ma. The posterior probability distributions of all parameters showed a high degree of convergence ( Figure S1). These results suggest that the origin of D. cambodiana dates back to Tertiary period from Middle Oligocene to Miocene. Although coalescent times of four groups dates back to Tertiary, the terminal lineages show more recent divergence dating back to the Quaternary period ( Figure 4c, Figure S3, and Figure S4).

Population Bottleneck Analysis
The population bottleneck analysis showed a mode-shift in allele frequency distribution in three populations (ML, MM and LA) from Northern Indochina (Table S5). These three populations (ML, MM and LA) along with two populations from Northeastern Indochina (PX and HF) and one population from Hainan Island (SY) showed significant deviation (Wilcoxon sign rank test, P,0.05; sign test, P,0.05) from mutation drift equilibrium (Table  S5). Among these populations, two populations (ML and MM) showed strong evidence for recent population bottleneck with statistical significance in all three tests. The population divergence analysis using 2MOD software program highlighted genetic drift as a major evolutionary force in D. cambodiana (P geneflow = 0.00014; Bayes Factor geneticdrift vs. geneflow = 101.464).

High Genetic Diversity and Limited Gene Flow
The genetic diversity estimates of D. cambodiana were higher than most high-altitude tropical, subtropical and temperate tree or shrub species in Southeast Asia [mean cpDNA HT = 0.79 in 76]. The diversity parameters (H E ) and genetic differentiation (F ST ) estimated using nuclear microsatellites were also high as compared to many tree species [77]. Plant species can survive during glacial periods in heterogeneous habitats [16] suggesting that the topologically heterogenous mountains may have served as reservoirs of genetic diversity as evident in Lithocarpus sp. in tropical Asia [78]. A high genetic diversity observed in D. cambodiana is in agreement with the postulation that high genetic diversity could be attributable to long-term evolution of plant populations isolation in refugia under fluctuating climatic conditions [14,79,80].
The limited seed dispersal and low levels of gene flow may act as a driving force of divergence among populations [81]. The rareness of shared haplotypes and prevalence of private alleles within populations of D. cambodiana indicate that the gene flow through seeds between populations is limited despite possessing berry type fruits adapted for bird-mediated long-distance dispersal. The flower of D. cambodiana is entomophilous and therefore pollen flow among distant populations is unlikely. The significant population genetic differentiation and pronounced larger N ST than G ST is an indication of limited gene flow among populations. High G' ST of cpDNA and nSSR support the strong population structure with limited gene flow among populations. Genetic diversity for each loci and each population was summarize in Table S3 and Table S4.  Although our cpDNA haplotype data showed a high genetic divergence between populations in Hainan Island and Indochina Peninsular, the microsatellite data provided insight into contemporary gene flow. The significant differences between corresponding F ST-SSR and R ST-SSR values indicated that population subdivision between Hainan Island and Indochina Peninsular reflect a phylogeographic structuring. Similarly, F-statistics and Bayesian model based clustering also reveled strong differentiation between populations in Hainan Island and Indochina Peninsular (South, North and Northeast) corroborating the cpDNA-based results and providing evidence for limited gene flow between populations. The low Nm value (,1) also suggests that the limited migration among populations may not offset the divergence due to genetic drift [82]. The bottleneck analysis supported the genetic drift biased genetic differentiation confirming the low levels of among-population gene flow. The significant positive F IS for some of the microsatellite loci is an indication that inbreeding occurs in several populations [83]. Thus, strong genetic differentiation among D. cambodiana populations appears to be the result of the combined effect of distinctly limited gene flow and significant inbreeding within populations.

Tertiary Origin and Rapid Divergence
Past geological events and associated climate change including the uplift of Himalayan-Tibetan Plateau and Quaternary glaciations [10,11,13] may have played an important role in the biological diversification, species extinction, speciation and distribution in the Asian tropics. The present topology of Asian mountains and the biogeographical boundaries in the Indochina Peninsula may have formed during the rapid uplift of Himalayan-Tibetan Plateau [11,84,85,86], erecting barriers for plant migration [9,12,18]. The present geographical distribution of D. cambodiana is closely linked to the topographical features of the Indochina Peninsula (Figure 1a), indicating genetic divergence of D. cambodiana was significantly impacted by Tertiary extrusions of mountain. The mammalian fossils data of Tertiary origin also suggest that the middle Miocene environmental change and geological events related to the collision of the Deccan plate with Eurasia may have contributed to species extinction and speciation [87]. Although BEAST and IMa differ in the assumptions of genealogy samplers for coalescence process [88], these two coalescence-based methods yielded similar results of TMRCA (9.77-15.80 Ma), confirming the robustness of time estimates based on molecular dating. Such TMRCA dating suggests that regional genetic diversification of D. cambodiana are of Tertiary origin, correlated with the rapid uplift of the Himalayan-Tibetan Plateau at 7-10 Ma [11], suggesting the influence of the uplift of Himalayan-Tibetan Plateau on genetic divergence. The historical events along with relationship between latitude and altitude indicate that D. cambodiana may have migrated to the north from the south as a megathermal rainforest element before the geological uplift of the north (Figure 1d) tracking the temperature increase during the early Tertiary [6]. This suggest that the uplift of mountains may have contributed to the increase in genetic diversity in north Indochina Peninsula. These evidences suggest that the genetic stock resulting from initial differentiation of D. cambodiana during the Tertiary may have persisted and served as the source material for further diversification.
The Quaternary period (,2.4 Ma to the present) is considered as one of the most important periods for genetic diversification and speciation [79,80,89] in a variety of ecosystems due to frequent changes in the climate and associated series of glaciations or ice ages [90,91,92]. The T TMRCA of Dysosma versipellis haplotypes corresponds to Pleistocene era suggesting the effect of glacial and  interglacial cycle on genetic divergence of plants [93]. Relatively rapid diversification of D. cambodiana terminal lineages with short branches corresponding to the Pleistocene period (Figure 4c, Figure S3 and Figure S4 ) indicates a rapid recent divergence of populations caused by Quaternary climate change [5,17,94]. However, skyline plot of cpDNA and bottleneck analyses of microsatellites suggested that D. cambodiana experienced a recent reduction in population, suggesting a Quaternary climate change also induced population size change. Thus, we inferred that evolutionary processes such as extinction-recolonization due to climate change during the Pleistocene may have contributed to the fast diversification in D. combodiana. Figure S1 The posterior probability convergence of t0, t1, t2 and t TMRCA . A: convergence of time parameters for each node in Figure 3