Phylogeography of Diptychus maculatus (Cyprinidae) endemic to the northern margin of the QTP and Tien Shan region

Phylogeography and historical demography of the cyprinid fish Diptychus maculatus (subfamily Schizothoracinae) are evaluated across three river systems in the Northern Qinghai-Tibetan Plateau (QTP) and Tien Shan range: the Indus River, Tarim River and Ili River. Results from both mtDNA (16S rRNA, Cyt b and D-loop) and nucDNA (RAG-2) resolved four reciprocally monophyletic clades, representing populations from Indus River, South Tarim River, North Tarim River and Ili River, respectively. The divergence times was estimated to be 1.5–2.5 Mya. It is consistent with the hypothesis that the split of four clades is the consequence of vicariance resulting from both the intensive uplift of QTP and Tien Shan as well as the resultant expansion of the Taklimakan Desert. Several lines of evidences indicate dynamic demographic histories for the populations, with late Pleistocene and Holocene population bottlenecks and expansions except the Indus River. Our results clearly depicted the phylogenetic relationship of D. maculatus from Indus River, Tarim River and Ili River. The analyses implicated the relationship among the distribution of D. maculatus, paleo-drainages and geographic events, and implied the existence of the South Tarim River in history.


Background
Tibetan Movement was the most important geological event in the Quaternary period, causing a series of large geomorphological adjustments and producing the present geomorphic, hydrologic as well as tectonic configurations [1]. According to Li [1], this significant movement included A, B and C phases occurring at 3.6, 2.5 and 1.7 Ma respectively. The Phase C marked the beginning of a new era of "World Ridge" development [2]. Especially between 1.1 and 0. 6 Ma, the so-called the Kunlun-Yellow River Movement elevated Qinghai-Tibetan Plateau (QTP) rapidly to an average height of 3000 m with mountains up to over than 4000 m [3,4]. Followed by Tibetan Movement, the Tien Shan area elevated again and reached the current height and formed the modern geomorphologic pattern of the northwestern China [5]. Meanwhile, the QTP blocked the sea winds of the Indian Ocean from going further into northern China, causing the development of the eastern monsoon and the formation of an arid and semi-arid climate in this area. The direct outcome of the dryness was the development and spread of Taklimakan Desert, the second largest shifting sand desert in the world, which covered the Tarim Basin surrounded by QTP and Tien Shan [2,6,7]. The expansion of Taklimakan Desert led to the change of watercourses, which deteriorated the ecosystem in the area [8].
The theories of modern phylogeography are based on the assumption that the biological evolution and geographical changes are synchronized [9]. Since the geographic changes will result in spatial segregation [10], characterizing the current distribution of the endemic fauna of on QTP may help us to explore the geological histories of these mountain-river systems within and around the region. It is well accepted that the evolution and distribution patterns of native freshwater fishes reflect the paleogeographical complexity of rivers [11][12][13]. The evolution and distribution patterns of schizothoracine cyprinids reflects the paleogeographical history of the QTP and adjacent regions [14], especially the evolution of paleo-drainage systems, of the area [15].
Diptychus maculatus is the species in a monotypical genus of the subfamily Schizothoracinae (Cyprinidae), which is well adapted to high-mountain streams with low temperature and hypoxia on QTP and Tien Shan [16,17]. It is mainly distributed in the Indus River system, the Tarim River Basin and the Ili River-Balkhash Lake Basin. These big river systems were originated from the surrounding mountains. The Tarim Basin, the largest endorheic basin on our planet ( Fig. 1), is bordered by the Tien Shan Mountain in the north and by Karakoram-West Kunlun Mountain in the south, which is also at the north edge of the QTP. From the north slope of Tien Shan, the Ili River flows northward into the Lake Balkhash. The Indus River flows southward to the Indian Ocean, which was originated in the Karakoram Mountain. The current Tarim River system has a five-source-one-mainstream pattern of which all tributaries originate in the south slope of Tien Shan and the north slope of West Kunlun Mountain. They flow into the Tarim Basin, across the desert from west to east, and converge on in the final destination in Lop Nur. Due to the uplift of QTP and Tien Shan in Quaternary period, the geographical and natural environments varied dramatically in the three river basins, especially in Tarim River, which is quite different from its paleo-pattern (Additional file 1: Figure S1). D. maculatus, as the endemic fishes, experienced all the environmental and geographic changes, hence, we hypothesize that the present distribution of D. maculatus is the consequence of a series of geomorphological adjustments in the region.
The previous researches on the D. maculatus focused on two aspects. One was the correlation between phenotypic variation (scale and grill rake numbers) and river systems, implying the species were adapted to the local aquatic environments by morphological changes [16,18]. The other is the discovery of phylogeographic structure of this species [19,20]. However, these studies were largely questioned and unable to uncover the entire evolutionary relationship between the fish and the mountain-river systems because of the inadequate samples size, incomplete geographic coverage (for example, no Indus River), and only mtDNA, the matrilineal marker, were used [15,[21][22][23]. To overcome these issues, we reconstructed the phylogeographic histoy of D. maculatus by applying both mtDNA and nucDNA markers together with the intensive sampling covering all the habitats. In the current study, we successfully describe the phylogeography and evolutionary history of the species. Meanwhile, our analyses supported the association among the uplift of QTP and Tien Shan, the paleodrainage systems as well as the present distribution of D. maculatus.

Sequence data
Three mtDNA and one nucDNA markers of D. maculatus were sequenced: the aligned 16S rRNA sequences (1118 bp) with 47 variable sites and 40 parsimonyinformative sites; the Cytochrome b (Cyt b) sequence (total length of 1140 bp) with 135 variable sites and 127 parsimony-informative sites; the D-loop sequences (708 bp) with 63 variable sites and 61 parsimony-informative and two insertion/deletion sites; the segment from RAG-2 (1250 bp) with six variable sites.
We detected seven RAG-2 haplotypes from all 261 samples of D. maculate. Among the combined mitochondrial sequences (2966 bp) from all individuals, 70 unique haplotypes in total were identified (Table 1 and Additional file 2: Table S1). The mean divergence among mtDNA haplotypes was 2.19 %.

Phylogenetic analyses
Based on the combined mtDNA data, the same topologies were recovered by both the Bayesian and ML trees (Fig. 2). The result showed a phylogeographic structure in which four distinct haplogroups corresponded well to four independent evolutionary clades, representing populations from South Tarim River, North Tarim River, Indus River and Ili River ( Figs. 1 and 2). The Taklimakan Desert separates the South and North Tarim River; while the Ili River and the North Tarim River are separated by Tien Shan; as the northwest edge of QTP, Karakoram Mountain separates the Indus River and the South Tarim River (Figs. 1 and 2). Within North Tarim River clade, three subdivisions (Kashgar, Aksu and Weigan River)were further classified, in a good agreement with three main tributaries of the North Tarim River ( Fig. 2; Additional file 2: Table S1). Within South Tarim River clade, two subdivisions (Yarkand and Hotan River) were corresponded with two tributaries ( Fig. 2; Additional file 2: Table S1). Moreover, the haplotype network for the Yarkand subclade contained two clusters of haplogroups that were separated by 14 mutations (Fig. 2). No haplotypes were shared between clades or subclades.
The haplotype network of the nucDNA marker, RAG-2 was inconsistent with the phylogenetic results of the mtDNA data ( Figs. 2 and 3). The Ili River collection (N = 47) and Indus River collection (N = 34) were monomorphic for haplotypes R6 and R7 respectively. Also, within the North Tarim River, haplotype R1 was monomorphic in the Weigan River collection (N = 34). In the Kashgar River collection (N = 4), three of them were monomorphic for haplotype R3, and the remaining one shared haplotype R2 with the Aksu River collection (N = 23). Additionally, within the South Tarim River and the Yarkand River collection (N = 57), 52 samples were monomorphic for haplotype R4, and the remaining five shared the same haplotype R5 with the Hotan River collection (N = 8) both ( Fig. 3; Additional file 2: Table S1). The nucDNA marker indicated that Weigan River population (haplotype R1) was genetically closer to South Tarim population (haplotype R5) with only one mutation than to North Tarim population (haplotype R2) with at least three mutations, which was quite different from the mtDNA analysis.  Table 1 Nucleotide diversity and genetic structure The total haplotype (h) and nucleotide (π) diversities were significantly high for all the four clades (Table 2). However, the nucleotide diversity was very different for among clades, higher in the South Tarim clade (0.6996 % ± 0.3469) and North Tarim clade (0.9799 % ± 0.4779) than quite low in the Ili River clade (π = 0.0806 % ± 0.0495) and Indus River clade (π = 0.0943 % ± 0.0568) ( Table 2).
To clarify the phylogenetic structure, analyses of molecular variance (AMOVA) was conducted based on the combined mtDNA data. The result showed that four clades accounted for most of variation (65.77 %), rather than three clades (39.51 %) ( Table 3). Distributed respectively among the groups of the North Tarim River and the South Tarim River, 96.28 and 87.92 % of total variation were observed, with only 0.04 and 4.85 % of the variation among populations within the groups ( Table 3).
As suggested by the phylogenetic inferences ( Fig. 2), the highest and most significant values of pairwise comparisons of the genetic differentiation (F ST ) were detected between all the four clades averaging more than 0.962, also between all the subclades averaging more than 0.940 (Table 4 and Additional file 3: Table S2). The values of F ST within the clades were high and significant except in the Indus River clades. In contrast, the values of F ST within the subclades were very low and not significant except that within Yarkand River. Mean divergence values between populations also strongly supported four clades inferences (Additional file 4: Table S3).

Demographic history
The estimated time to the most recent common ancestor (TMRCA) with 95 % highest posterior density (HPD) was showed in Additional file 5: Figure S2. The TMRCA of the in-group was estimated around 2.07 Ma with a 95 % HPD range from 1.71 to 2.45, indicated Early Pliocene divergences. The separation of Indus River and Tarim River clades occurred at 1.87 Ma with a 95 % HPD ranging from 1.55 to 2.19. The divergence time of North and South Tarim River clades by Taklimakan Desert was 1.77 Ma with a 95 % HPD ranging from 1.47 to 2.09.
Historical demography for the clades Tarim, Ili and Indus River provided the independent evolutionary process of D. maculatus from different drainages ( Fig. 4; Table 2). The Tajima's D values and Fu's Fs values indicated the population experienced a quick expansion in all clades and subclades except in the Indus River clade. Correspondingly, the star-like structure of haplotype networks within the eight clades and subclades also suggested the expansion event in the Ili River as well as North and South Tarim rivers (Fig. 2).
The Bayesian skyline plots (BSPs) showed that the population size was not statistically significant changed because of broadly overlapping HPDs (Fig. 4). However, the trend for each population size was consistent with the neutrality test results. For the North Tarim River clade (Weigan and Aksu River), BSP indicated a moderate and recent (25 and 10 Kya) expansion, whereas BSP of South Tarim River clade

Discussion
Phylogeography and historical demography of D. maculatus The current study drew a whole picture of phylogeography of D. maculatus for the first time by the intensive sampling across the north of QTP and adjacent regions, from three major drainages: Indus River, Tarim River and Ili River. We identified four clades, of which two corresponded geographically to the Indus River and Ili River respectively, and the other two from Tarim River population separated by the Taklimakan Desert ( Figs. 1 and 2).
The divergence history of D. maculatus clearly showed that the split of the species occurred when the geographic events happened. For example, the estimated producing times of four clades corresponded to the timing of the uplift of Tien Shan in the Early Pleistocene period and the third phase of the "Tibetan movement" at 1.7 Ma [1]. The forming of three subclades within the North Tarim River also occurred when Tien Shan uplifted in the Early Pleistocene period. The two subclades within the South Tarim River clade were split during Kunlun-Yellow River Tectonic Movement (1.1-0.6 Mya) [1]. Therefore, we concluded that the allopatric divergence producing four clades and five subclades of D. maculatus was likely associated with the vicariance caused by the rapid uplift of Tien Shan and QTP.
Several lines of evidence confirmed a dynamic demographic history for the five clades/subclades resolved in D. maculatus. First, star-like haplotype networks with high level of haplotype diversity and low nucleotide diversity indicated all population but Indus River clade Fig. 2 The Bayesian inference tree for Diptychus maculatus based on the 70 haplotypes from combined mtDNA. Numbers on branches indicate, posterior probability in BI analyses followed by bootstrap supports for ML the node. The corresponding median-joining network based on the combined sequence data for each clade is depicted to the right of each clade. The haplotype numbers correspond to those in the Additional file 2: Table S1. The circle sizes represent the approximate numbers of individuals, and the scale is provided in the lower right corner. The black dots indicate the nucleotide substitutions inferred for that branch. The geographical origins of the haplotypes are illustrated by the same colors used Fig. 1 experienced the bottleneck effect followed by the population expansion [22,24]. Second, the negative Tajima's D and Fu's Fs statistics together with the BSPs indicated Yarkand, Aksu, Weigan and Ili river populations expanded in a late Pleistocene or Holocene.
The principal reason of the demographic changes was the climatic and environmental alterations caused by the geographic events. In the Late Pleistocene and Holocene era, the large-scale and rapid uplift of QTP resulted in the dramatic changes in both climate and ecology [2]. The BSPs for Yarkand and Weigan Rivers indicated that the expansion began 25 Kya, and the expansion for Aksu River began 10 Kya, coinciding with the uplift of Karakoram-West Kunlun Mts. This movement allowed the westerly wind to come into northern slope of Karakoram-West Kunlun Mts., which sequentially lead to the warm-humid climate and river development in the area [25]. In contrast, because the Indus River population was located in the southern slope of Karakoram-West Kunlun Mts. where the westerly wind could not reach, the population size was gradually declined as indicated by the BSP analysis. Notably, samples of Indus River population were collected from very high altitude (4810 m), and the harsh conditions presumably had influences on the genetic variations of the population. The small number of haplotypes discovered in Indus River population resulted in the lack of signal to evaluate the dynamic demographic history. The Ili River, protected by the Tien Shan from the Taklimakan Desert, was consistently under the influence of the westerly circulation, leading to a small difference in precipitation between cold and warm periods in the Late Pleistocene period [26], therefore, the population was maintained stably, until 7-5 Kya. Ili River population was expanded during the late Holocene period (6-1.5 Kya) with a relatively wet conditions induced by stronger westerly circulation [27]. BSPs pointed out all the populations experienced a depression in 19 th century when population explosion as well as the development of agriculture and industry caused by the urbanization along the rivers. The ecological consequences of human activities were water shortage, soil salinization, enhanced ecological degradation and desertification.   Table S1. The circle sizes represent the approximate numbers of individuals, and the scale is provided in the lower right corner. The black dots indicate the nucleotide substitutions inferred for that branch. The geographical origins of the haplotypes are illustrated by the same colors used Fig. 1 The collection of samples turned out to be very challenging in Kashgar and Hotan Rivers because of both environmental and climatic changes (aridification and desertification) as well as human activities such as overfishing, reservoirs, groundwater wells and hydropower stations. Especially, jade mining in Hotan River [19], changed the watercourse and destroyed freshwater habitats.

Drainage history of Tarim River
Compared with the modern drainage patterns, the paleodrainage was in better agreement with the phylogeographic structure which aims to decode the history of species evolution, and to reconstruct the sequence of evolutionary events [17,28,29]. The current Tarim River has a five-source-one-mainstream pattern, i.e., the North Tarim River. However, it is a controversial view about the existence of the South Tarim River among scientists, some consider that the South Tarim River existed in history, but some others disagree [30]. We used a phylogeographic approach to assess the evolution and distribution patterns of freshwater fish D. maculatus generated from mtDNA data, which reflect the palaeogeographical complexity of a region, especially, the development of rivers and their isolation and interconnection processes [29]. Based on our analysis, populations from five tributaries of Tarim River were classified into two clades, the North clade containing three subclades (Weigan, Aksu, Kashgar), and the South one containing two subclades (Yarkand and Hotan), which were geographically divided by the Taklimakan Desert. Similarly, the nucDNA RAG-2 showed that the gene flow existed within the north or south clade only. It seemed that the disintegrated modern Tarim River system was inconsistent with the phylogeologic history of D. maculatus. Therefore, our observation supported the hypothesis that the South Tarim River has indeed existed in history. In the further analysis, more nucDNA markers are required to verify the hypothesis accurately.
Ayelhan et al. [31] proposed that D. maculatus originated from West Kunlun area. Amongst the five subclades of Tarim River, the highest values of F ST and nucleotide diversity in Yarkand population suggested it was a possible center of the origin of the species.

Sampling and laboratory procedures
D. maculatus is a species under protection, hence difficult to sample [16,17,32]. The entire animal experiment was conducted according to the principles expressed in the "Guide for the Care and Use of Laboratory Animals" by National Research Council of the National Academies of Science. Samples were collected using gill nets or cast nets in June and July 2010, May to June in 2013, as well as July and August in 2014. All the specimens were preserved in 95 % ethanol for further laboratory analyses. The definition of samples was on the basis of the classical taxonomic description by Chen & Cao [17] and Wu & Wu [16]. A total of 261 individuals  of D. maculatus were used for the phylogenetic and population genetic analyses (Additional file 2: Table  S1). The samples were collected from 16 populations across three different river systems in northwestern China (Fig. 1), including 47 individuals from Ili River Basin, 34 individuals from Indus River Basin, and 180 individuals from Tarim River Basin (Table 1). According to the comparative morphology [16,17] and molecular phylogenetics [33,34], Gymnodiptychus dybowskii (GenBank accession number KJ081377 for 16S rRNA and KJ081423 for Cyt b) [22], Aspiorhynchus laticeps (KF564793) [35], and Barbus barbus (AB238965) [36] were used as outgroups in the phylogenetic analyses based on their locations in the Tarim River (Xinjiang), Tarim River (Xinjiang) and Danube River (Austria) respectively. Voucher specimens were deposited in the key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Plateau Institute of Biology, the Chinese Academy of Sciences in Xining (Additional file 2: Table S1). Total genomic DNA was extracted with proteinase K followed by the standard 3-step phenol-chloroform method [37]. Complete sequence of Cyt b gene, partial sequences of the 16S rRNA, D-loop and RAG-2 genes were obtained for all the sampled individuals. When direct sequencing of RAG-2 failed due to heterozygosity, amplicons of RAG-2 were cloned using a TA-cloning system (TakaRa Biotechnology, Dalian, China) following the manufacturer's instructions [38]. The master mixture of polymerase chain reaction (PCR) contained approximately 100 ng of template DNA, 1 μL (10 pmol) of each primer, 5 μL of 10× reaction buffer, 2 μL of dNTPs (2.5 mM of each) and 2.0 U of Taq DNA polymerase, in a total volume of 50 μL. Reactions were carried out on a Veriti Thermal Cycler (Applied Biosystems, Carlsbad, CA, USA) and always included a negative control. Individual thermal cycling parameters for each primer set are provided in Table 5, and all protocols began with 3 min at 95°C and ended with 10 min at 72°C. The amplified DNA was fractionated by electrophoresis through 0.8 % low-melting agarose gels, recovered from the gels, and purified with Gel Extraction Mini kit (Watson Biotechnologies, Shanghai, China). The purified DNA Sequencing was sequenced with the Perkin-Elmer BigDye DNA Sequencing Kit according to the manufacturer's protocol, using the same primers employed in the PCR.

Phylogenetic analyses
Phylogenies of the mtDNA data were constructed using maximum likelihood (ML) and Bayesian inference (BI) as implemented in PHYML 3.0 [41] and MRBAYES 3.2.1 software [42]. The most appropriate nucleotide substitution models for the three segments were selected using Akaike Information Criterion as implemented in JMODELTEST 2.1.4 software [43]. Using a starting tree obtained by neighbour-joining. Clade robustness was assessed by bootstrap analysis using 1000 replicates. To assess the statistical significance of nodes, a bootstrap analysis with 100 replicates was used for the ML analyses, remaining settings set to default. The posterior distributions from BI were obtained by a Markov Chain Monte Carlo (MCMC) analysis with one cold chain and three heated chains. Samples of the trees and parameters were drawn every 100 steps from a total of 1 million MCMC generations. Three additional runs were conducted beginning with random trees. The 50 % majority rule consensus of the postburn (using a burn-in of 25 %) for all the generations was computed for all the four runs.
The NETWORK 4.6 [44] was used to build a medianjoining network for both mtDNA and nucDNA data.

Molecular diversity and genetic structure
We used ARLEQUIN 3.5 software [45] for AMOVA and pairwise F ST values. Both AMOVA and F ST used Tamura & Nei genetic distance [46] with gamma correction for heterogeneity of mutation rates.

Population demography
The time of divergence was estimated using a strictclock Bayesian approach in BEASTv1.8.0 [47], with the GTR + G substitution model suggested by JMODELT-EST 2.1.4 software [43], with Yule prior approach, and a random starting tree. In the absence of a fossil record of schizothoracines, we used the divergence-time estimates from He et al. [34] for an internal time calibration on branching point: D. maculatus vs. G. dybowskii (7.77 ± 0.51 Mya). Another calibration point was based on the European fossil records of the genus Barbus, species B. barbus from the Pliocene period (15-11 million years ago, Ma) [36,48,49]. The analysis included three independent MCMC runs sampled every 1000 generations for 30 million generations with 20 % of samples discarded as burn-in. The effective sample size for parameter estimates and convergence was checked using TRACER1.5 [50]. The corresponding tree files were merged with LOGCOMBINER v1.8.0, and trees were summarized using TREEANNO-TATOR v1.8.0, two software programs distributed within the BEAST package [47]. Trees were visualized in FIGTREE 1.4 [51].
We used three different approaches to assess demographic history. These included two neutrality statistics, Tajima's D and Fu's Fs [52,53], calculated with 5000 permutations in ARLEQUIN 3.5 [54]. The negative values for these indices are consistent with population expansion. To evaluate the timing of any expansion, we used BSP obtained with BEASTv1.8.0 [47] derived with three independent runs (30 million simulations; first 10 % as burn-in). Both log files and tree files were analyzed using TRACER1.5 [50]; effective sample size for all parameters was more than 200. The results were consistent across runs. We then ran these analyses under the GTR + G model, with a strict molecular clock. The substitution rate for combined three mtDNA genes was estimated from the previously described analysis of divergence times to be 0.82 % per Mya with a 95 % highest posterior density (HPD) of (0.69, 0.91). We could not run BSPs for Kashgar and Hotan River population due to the small number of samples available (<10).