Reconciling the biogeography of an invader through recent and historic genetic patterns: the case of topmouth gudgeon Pseudorasbora parva

The genetic variability and population structure of introduced species in their native range are potentially important determinants of their invasion success, yet data on native populations are often poorly represented in relevant studies. Consequently, to determine the contribution of genetic structuring in the native range of topmouth gudgeon Pseudorasbora parva to their high invasion success in Europe, we used a dataset comprising of 19 native and 11 non-native populations. A total of 666 samples were analysed at 9 polymorphic microsatellite loci and sequenced for 597 bp of mitochondrial DNA. The analysis revealed three distinct lineages in the native range, of which two haplogroups were prevalent in China (100%), with a general split around the Qinling Mountains. Dating of both haplogroups closely matched past geological events. More recently, its distribution has been influenced by fish movements in aquaculture, resulting in gene flow between previously separated populations in Northern and Southern China. Their phylogeography in Europe indicate as few as two introductions events and two dispersal routes. Microsatellite data revealed native populations had higher genetic diversity than those in the invasive range, a contrast to previous studies on P. parva. This study confirms the importance of extensive sampling in both the native and non-native range of invasive species in evaluating the influence of genetic variability on invasion success.


Introduction
A central theme in evolutionary biology is to understand the processes of how species adapt to changing environmental conditions (Bell and Gonzalez 2011;Ghalambor et al. 2007). Although empirical studies have confirmed some important theoretical predictions regarding how population size influences neutral genetic variation, it remains poorly understood how, for example, initial small population sizes affect the long-term survival of populations (Hedge et al. 2014;Hufbauer et al. 2013). Biological invasions are well suited to addressing such fundamental questions since they represent a natural translocation experiment over multiple generations (Blanchet 2012;Vandepitte et al. 2014). Abiotic and biotic conditions often differ between native and invaded environments, with the introduced species not sharing recent evolutionary history with species in the native community (Gallien et al. 2012). The colonisation of the new environment by the introduced species might be impeded by factors including competition for food resources and the unpredictable impacts of native parasites on new hosts (Kumschick et al. 2015), which can constrain its ability to establish self-sustaining populations. Thus, if they are to be successful, invasive species must respond quickly and efficiently to the selective regime imposed by the colonised ecosystem (Broennimann and Guisan 2008).
Introductions of non-native species are often based on the release of a low number of founding propagules containing only a fraction of the genetic variation of the source populations (Hanfling 2007;Searle 2008). During this process, these populations often undergo a succession of genetic bottlenecks, resulting in their invasive populations having lower genetic diversity compared with populations in the native range (Besnard et al. 2014;Facon et al. 2011;Hardouin et al. 2010Hardouin et al. , 2015. Such reduced genetic diversity theoretically limits a species' ability to establish invasive populations, invoking a genetic paradox over how introduced species can successfully develop invasive populations despite this low diversity (Hanfling 2007).
This paradox of genetic diversity is often counteracted by admixture among genetically divergent source populations upon their introduction (Roman and Darling 2007;Rosenthal et al. 2008). For example, multiple introductions have resulted in high genetic diversity of invasive crustaceans, fish and lizards (Bermond et al. 2012;Biebach and Keller 2012;Chapple et al. 2013;Keller et al. 2014;Schulte et al. 2013). This can increase the level of heterozygosity in newly established populations and lead to increased fitness through heterosis (Keller et al. 2014). Sampling biases, however, such as extensive sampling in the invasive range and under sampling in the native range, often bias true estimates of genetic variability, phenotypic traits and behaviours, impeding accurate comparisons across the two ranges (Diez-del-Molino et al. 2013;Sanz et al. 2013). Such issues are particularly apparent in studies on freshwater fishes due to the inherent difficulties and expense of extensive sampling programmes (Smith et al. 2014). Since each invaded river catchment potentially represents a biogeographical island where populations may have locally adapted following separation and isolation through, for example, past geological events, it is important to have an extensive representation of variability in the native range (DeFaveri et al. 2014;Wiens et al. 2014).
Small freshwater fishes are often accidentally introduced into new environments via contaminated batches of other fishes used in angling or aquaculture Davies et al. 2013) and so invasions of small-bodied freshwater fish are often strongly associated with aquaculture trade pathways (Copp et al. 2010). The topmouth gudgeon Pseudorasbora parva, a small cyprinid fish, is naturally distributed in eastern Asia. It was introduced into Europe from China in the 1960s through a succession of accidental introductions into the area around the Black Sea when they contaminated batches of Chinese carp species being moved in aquaculture (Gozlan et al. 2010a;Gozlan 2011). It has now invaded at least 32 countries, including most of Europe, plus Turkey, Iran and North Africa, with their long-distance dispersal also occurring via aquaculture trade routes (Gozlan 2011). P. parva's expansion is not, however, just limited to a non-Asian range, with their translocation to numerous locations within China. The native range of P. parva in China encompasses the Yangtzee River around the Qinling Mountains. Human translocations have led to non-native P. parva populations in western China, Tibet and Mongolia (Gozlan et al. 2010a, b). The wide geographic range invaded by P. parva thus makes this a strong model species for understanding invasion genetics. To date, the two studies that address the genetic diversity of P. parva (Simon et al. 2011(Simon et al. , 2015 only included a very restricted sampling of their native range. In those studies, higher levels of genetic variability in the introduced range were detected than in the native range, highlighting the potential role of admixture in the invasive range in benefitting invasive vigour (Simon et al. 2011). However, the samples from the native range used potentially biased these conclusions, constraining understanding of their invasion history. Moreover, placing greater emphasis on sampling in the native range, can reveal a more complete picture of the longterm native biogeographic history of the species, including the role of geological events in shaping genetic topology (i.e. lineages), which could have influenced recent invasions (Diez-del-Molino et al. 2013).
Given the potential utility of invasive and native P. parva to assist in the understanding of the role of genetic variability in driving invasions, here we use a new and extensive collection of P. parva population samples from both their native and invasive range to: (1) characterise the present genetic diversity of P. parva by determining its spatial population structure across its major native river catchments; (2) characterise the genetic variability in the invasive range in order to better understand how it has been influenced by the invasion process; and (3) determine the influence of past geological and anthropogenic events on the current genetic structure of P. parva.

Materials and methods
Sampling Samples of P. parva from across their native range in China were collected in 17 river catchments, from the Amur River basin in the North to the Pearl River basin in the South (Table 1). Additional samples were obtained from Taiwan and Japan (one population per country). Concomitantly, P. parva samples from their invasive range were collected from 18 countries across Eurasia (Table 1). In all cases, the sampling method was appropriate for the water body in question, utilising fish traps, electric fishing and/or micro-mesh seine netting. In the field, the fish were euthanized with an overdose of anaesthetic, preserved in 95% ethanol and then stored at -70°C. In the laboratory, each fish was identified as P. parva, measured and a tissue biopsy (fin-clip) taken and stored in 95% ethanol.
Cytochrome b gene sequencing and analysis DNA was extracted from the P. parva fin-clips using DNeasy extraction Kit (Qiagen). A total of 563 Cytochrome B gene sequences were amplified using L15267 and H15891Ph primers (Briolay et al. 1998). All reactions were carried out in a final volume of 10 lL. PCR cycling parameters were as follow: 95°C for 15 min followed by 35 cycles of 95°C for 30 s, 60°C for 1.30 min, 72°C for 1 and 15 min at 70°C for the final elongation time. Exo-Sap purification (USB Corp.) was performed with the following incubation: 37°C for 30 min and 80°C for 20 min. The cycle sequencing reaction parameters were 96°C for 1 min followed by 29 cycles of 96°C for 10 s, 55°C for 15 s and 60°C for 4 min using an ABI Applied Biosystem, 3730 DNA Analyzer. The sequences generated were visualized and analysed using CodonCodeAligner (CodonCode Corp.). All the sequences were submitted to Genbank: accession number KR074432 to KR074994.
Divergence dates were estimated using the relaxed molecular clock approach implemented in BEAST v2.4.3 (Bouckaert et al. 2014). MCMC analyses were run for 20 million generations with tree sampled every 1000 generations with the model selected by jModeltest (TN ? I ? G). Convergence was checked and burn-in was estimated with Tracer v1.6 ). The trees were combined into a maximum clade credibility tree using TreeAnnotator using a burn-in of 10%, with the estimation of the mean node height and highest posterior density intervals fixed at 95%. As no reliable calibration interval was available for Gobioninae, we used secondary calibration divergence times (mean and the 95% confidence interval) obtained in Gobioninae and closely related taxa by Saitoh et al. (2011). We retained 4 calibration intervals with lognormal prior: Leuciscinae (Alburnus alburnus)/Tinca:

Microsatellites genotyping and analysis
Microsatellites from Konishi and Takata (2004) and Zeng et al. (2011) were tested and 666 P. parva samples were successfully genotyped for 9 polymorphic microsatellites (MS07, PA2, PA6, PA14, PA15, PA18_1, PA18_2, PA20 and PA22). The 5 0 ends of the forward primers were labelled with either HEX or FAM dyes. All reactions were carried out in a 5 ll volumes using DNA template at 10 ng/ll. Primers pairs were pooled and reactions were performed using the Qiagen multiplex PCR kit (Qiagen). PCR conditions were as follows: 95°C for 15 min followed by 28 cycles at 95°C for 30 s, 50-60°C (depending on the pool) for 1.30 min, 72°C for 1.30 min with a final extension at 72°C for 10 min. PCR products were diluted 1:20 in water. 1 lL of this dilution was added to 10 lL of HiDi formamide and 0.1 lL of 500 ROX size standard (Applied Biosystems) using an ABI Applied Biosystem, 3730 DNA Analyzer. Amplicons were denatured by heating at 90°C for 2 min followed by cooling at 20°C for 5 min. Alleles were scored using GeneMapper (Applied Bioscience) by two independent scorers. Allelic richness, as a standardized measure of the number of alleles per locus independent of the sample size was calculated using Fstat (Goudet 1995). The allelic richness between populations was calculated with 1000 permutations using Fstat (Goudet 1995). Calculation of the observed and expected heterozygosity as well as the average number of allele per locus was obtained using GENETIX (Belkhir et al. 2004) for each population.
To determine the genetic structure of the tested populations we used a combination of methods. Firstly, discriminant analysis of principal components (DAPC) (Jombart et al. 2010), was used to assess the population structure using the complete dataset (i.e. native and invasive populations were combined). This was performed in R using the ADEGENET package (Jombart 2008). This multivariate analysis derived the probability of individual membership in each different group. The program covered a range of possible clusters from 1 to 31, 31 representing the total number of populations in the dataset. The interaction number used was 1000,000,000. 50 principal components were retained as predictors for discriminant analysis in the 666 individuals studied.
In addition, we performed a structure analysis on the dataset. This analysis was performed separately for the Asian and European P. parva populations using STRUCTURE software (Pritchard et al. 2000), with a burn-in period of 500,000 simulations followed by a run length of 1000,000 MCMC simulations and ten iterations for each K (number of clusters) with the admixture model. To infer population structure on a finer scale, the analysis was performed from K = 1-18 for the 18 Asian populations and from K = 1-11 for the 11 European populations. The results were analysed using STRUCTURE HAR-VESTER (Earl and vonHoldt 2012) and the K was chosen using a combination of Evanno's criteria (Evanno et al. 2005) and investigating the convergence of the runs using CLUMPP (Jakobsson and Rosenberg 2007). Results were summarized using CLUMPP (Jakobsson and Rosenberg 2007) and displayed using Distruct (Rosenberg 2004). We tested the scenario that P. parva populations geographically closer to one another would also be more genetically similar by generating matrices with Fst values using Arlequin (Excoffier and Lischer 2010), and correlating these to geographic distances between these populations. The two matrices were correlated using the Mantel test (R software, Vegan).
In Japan, the most prevalent lineage was Lineage 4 (Figs. 1, 2). This lineage had 4.7% divergence with the main lineages (1 and 2) with 28 mutations difference to the closest sequences. Lineage 2 was present in Japan but at a lower prevalence (Lineage 2: 30% and Lineage 7: 70%; Fig. 1). Lineage 3 (green) was only been found in Iran and diverged by 5.2% from the other lineages (31 mutations difference with the closest sequence).
In the invasive populations of Europe, only lineages 1 and 2 were identified (Lineage 1: 69% and Lineage 2: 31%). All individuals sampled from Bulgaria, Turkey and Armenia shared the same haplotype. Overall, the average haplotype diversity was lower in Europe (0.4) compared to China (0.68). The European individuals were present throughout the phylogeographical tree (Fig. 2), suggesting several introductions had occurred. Pairwise Fst calculations between Europe, China, Taiwan, Iran and Japan revealed that China and Europe were genetically similar (Fst = 0.09), indicating a direct relationship between those populations (Table 2).

Historical demography and haplogroup dating of native Pseudorasbora parva
The historical demography of China, Europe, Iran, Japan and Taiwan was investigated using Tajima's D and Fu's Fs values. The values were positive and not significant for all populations except for China. Tajima's D is -1.57 (p = 0.02) and Fu's Fs is -24.47 (p \ 0.01) for China, indicating that the population is expanding. Regarding haplogroup dating, the chronogram obtained with BEAST revealed the two main lineages, 1 and 2, separated in China 2.5 Mya [0.9, 4.5] with 95% confidence intervals (Fig. 3). Lineages 3 (Iran) and 4 (Japan) separated 3.9 Mya [1.2, 6.7] and diverged from lineages 1 and 2 12.1 Mya [6.5, 18.1] (Fig. 3).

Microsatellite data
A total of 666 individual P. parva were successfully genotyped at 9 microsatellite loci. The allelic richness between Europe and China was calculated to be 2.46 and 2.81 respectively, with this difference significant (p = 0.02). The allelic richness of the population located to the north of the Yantze River (2.75) was compared to the populations located to the south of the Yantze (2.91) was not significant (p = 0.39). Compared with the allelic richness of populations to the north and south of the Yantze river, the European populations were significantly different from the c Fig. 1 Summary of mitochondrial haplogroup and the population structure derived from microsatellites across Pseudorasbora parva's geographical distribution: a native range, China, Taiwan and Japan and b Eurasia. Abbreviations of locations can be found in Table 1. Lineage 1 and 2 are represented in pink and blue respectively 123 Fig. 2 Bayesian phylogeny reconstructed with cytochrome b dataset. The support, posterior probabilities determined with MrBayes, is indicated for each node. Only the nodes with a support above 50 were retained. Blue stars indicate European samples southern populations (p = 0.01) but not from those to the north (p = 0.09). Chinese populations showed a higher genetic diversity, with a Hobs average with and without the two invasive Chinese populations (S19 and S20) of 0.66 and 0.65 respectively. The average number of allele per locus with and without S19 and S20 was 12.05 and 11.67 respectively. European populations had a lower genetic diversity with a mean Hobs average of 0.55 and the average number of allele per locus of 5.56 (Table 1). Populations S19 and S20 are invasive P. parva populations in China and they have a genetic diversity that is lower than native Chinese populations but higher than European populations (average Hobs = 0.60, average number of allele per locus 8.78). The Fst values for microsatellites were calculated using Arlequin (Excoffier and Lischer 2010) and can be seen in Table 3. Interestingly China and Europe have a very low Fst value (0.04-see Table 3) which expected as P. parva populations from China was introduced in Europe. Iran has very high Fst value in agreement with the mitochondrial data. The concordance between the mitochondrial and microsatellites makers was found to be positive with a R 2 coefficient of 0.69.
In the DAPC analysis that assigned the individuals to their genetic population, the lowest BIC value (851.07) was found for K = 13. The first 50 PCs of the PCA, explaining 71.2% of the total variance, were kept. Out of the first six axes, several clusters were found, with Iran showing the highest divergence from the other populations (Fig. 4a). Notably, Spain's genetic distinctness compared to other invasive populations is explained by Axis 3. This result might be due to the low genetic diversity of the Spanish population (Table 1). The two populations from Bulgaria and Turkey (Fig. 4b) clustered together in Fig. 3 Chronogram based on BEAST analysis of the cytochrome b dataset. The four calibration points are indicated by a star (see text for details); for each node, the mean and the 95% credibility intervals are indicated

123
Axis 4 and, finally, Taiwan separated from the other populations on axis 5 (Fig. 4c). A STRUCTURE analysis was performed, using the Evanno et al. (2005) method and investigating the convergence of the runs using CLUMPP (Jakobsson and Rosenberg 2007). Using the pairwise G 0 parameter from CLUMPP (Jakobsson and Rosenberg 2007), we found that only 2 K converged (K = 9 and K = 10) in the Chinese population analysis. K = 9 was selected as the best model for the ancestral populations in China as it had the highest DK (Fig. 5). The clustering of the different genetic populations was in agreement with the geographical distribution of the sampled populations (Fig. 1a). The Taiwan populations formed an independent genetic population. S7, S9, S19 and S20 (South-West China) clustered together, as per the DAPC results (Fig. 3). The populations from North-East China, S13, S14 and S15 formed another cluster. Populations from western China, S3, S4 and S5, clustered together, and populations S1 and S16 formed the last cluster. Populations S6, S10, S11 and S17 represented 4 different genetic populations. In Europe, the Evanno method and the study of the convergence using CLUMPP revealed K = 6 (Fig. 5b), where independent clusters were (1) Bulgaria and Turkey, (2) Austria and UK, (3) Hungary, (4) Belgium and Poland, (5) Italy, Spain and final (6) Slovenia. Interestingly K = 2 was also found to be convergent with two clustered being formed (1) Bulgaria and Turkey and (2) the rest of the European populations. Bulgaria and Turkey have the same mitochondrial haplotype and this result suggests a single introduction of P. parva in those two countries. The Mantel test indicated that populations that were geographically furthest apart (geographically) were also more genetically distant (Mantel statistic r = 0.5472; p = 0.002).

Discussion
These genetic results for native and invasive P. parva populations confirm the importance of extensive sampling in both ranges for invasive species in evaluating the influence of genetic variability on invasion success. Analyses revealed that P. parva had higher genetic variability in the native range than the invasive range, contrasting with the previous findings of Simon et al. (2011Simon et al. ( , 2015. The successful invasion of 32 countries by P. parva has been previously attributed to admixture of different source populations in its invasive range (Simon et al. 2011(Simon et al. , 2015. However, our data demonstrated that admixture had already occurred in the species' native range, through  Fig. 4 Representation of the microsatellite variation on the first six axes of a DAPC. The percentage of the total eigenvalue explained by: a Axis 1 is 34.07% and Axis 2 is 17.36%; b Axis 3 is 12.92% and Axis 4 is 9.27%; c Axis 5 is 7.81 and Axis 6 is 5.09% its movement along with aquaculture-valuable species (such as Chinese carps), resulting in highly variable, admixed populations in Chinese reservoirs and aquaculture facilities that were likely to be the source populations for the European introductions.
Genetic variability across its invasive range The invasive range of P. parva not only includes Europe, but also regions within China (i.e. populations S19 and S20). Our results indicated that the P. parva introduction to Europe originated from China, not Japan, providing strong support that introductions were ultimately associated with commercial links between China and Eastern European countries in the mid to late Twentieth Century . The first introduction of P. parva has been attributed to Romania (Nucet Fisheries Research Centre, in 1961), with imports of eggs of silver carp Hypophthalmichthys molitrix and grass carp Ctenopharyngodon idella from Wuhan, China (Gozlan 2011). However, similar cooperation programmes took place at the same time within countries of the Eastern Bloc and resulted in further introductions of P. parva to Hungary (Paks Fisheries Farm, in 1963), Lithuania (Dunojus Lake, in 1963) and the Ukraine (Kuchurganskoye reservoir, in 1962) (see Gozlan 2011 for full details). There was lower genetic diversity in the European and Chinese P. parva invasive ranges compared to the native range, as would be expected due to population bottleneck events during the introduction and colonisation process (example: Schimd-Haempel et al. 2007;Hanfling 2007;Hardouin et al. 2010). Introductions from China were also identified in Taiwan and Japan. Japan has endemic lineages of P. parva (Lineage 4) and more recent introductions from China (Lineage 2). The genetic analyses of the Iranian P. parva indicated that their source was genetically different to all the sampled populations in the Chinese native range. This population had a single mtDNA lineage (Lineage 3) that was highly divergent, with 5.2% difference in comparison to the lineage found in China (31 mutations difference with the closest sequence). The source population is thus divergent to the ones sampled in the Chinese native range but more closely related with the endemic population of P. parva in Japan (Lineage 4). These two lineages have been separated 3.9 Mya, thus determining the potential population source of this introduction would require a more intensive sampling of the native range of P. parva, including the Korean peninsula, which is currently missing.
Phylogeography in the native range: China, Taiwan and Japan Understanding the phylogeography and the factors affecting the genetic structuring of invasive species in their native range reveal historic and recent selection events that could have influenced invasion success (Zenni et al. 2014). Recent evidence indicates that genotypes that perform well in their introduced ranges can result from both past evolutionary events in their native range or rapid adaptation following introduction to new environments (Zenni et al. 2014). Here, the Qinling Mountains have played an important role in influencing the phylogeography of P. parva. These mountains represent a natural boundary between the North and the South of the country, dividing the Chinese temperate and sub-tropical climatic zones (Ding et al. 2013), resulting in differentiated terrestrial and freshwater fauna (Li 1981;Zhang 1999). The split between lineages 1 and 2, estimated around 2.5 Mya, might have been a consequence of the fast uplift (500-1000 m) of these mountains that was influenced by the Qinghai-Xizang Movement, initiated approximately 3.4 Mya (Teng and Wang 1996;Zhang and Fang 2012).
This distribution of the P. parva has also been influenced by past human activities that resulted in gene flow between populations to the north and south of the Qinling Mountains. For example, the construction of the Lingqu Canal 2200 years ago (Zhong 1990) connected the Yangtze River and the Xijiang (Peal River basin), leading to a similar P. parva population structure in these areas (S7 and S9). The Beijing-Hangzhou Grand Canal (built from AD 581-618), flows across several river systems including the Yangtze, Huahe, Yellow and Haihe Rivers in the East of China and could explain the presence of the southern lineage 2 in some northern populations (S16, S17, CRH, and S3). China's long history of carp aquaculture and aquaculture trading is also likely to have been an underlying process in formation of similar genetic structures observed in some other populations (e.g. S4, S5, S6) (Yue and Liang 1995;Zhao et al. 2015).
To the north of the Yangtze River, two population clusters were detected, corresponding to the Yellow River (S1, S2, S16) and the Amur River Basin (S13, S14, S15), which includes populations from inner Mongolia (S17). Again, although there was clear genetic structuring of these populations, they mostly formed part of the same northern Lineage 1, indicating recent colonisation of these catchments, probably corresponding to the development of pond aquaculture in China that can be traced back approximately 3300 years (Yue and Liang 1995) and which significantly increased during the Han dynasty (Shi 1999).
Finally, the specific population structure indentified in the two highly admixed populations (S10, S11), which displayed high genetic diversity, was likely to have resulted from the construction of many small and mid-sized dams during the late 1950s-1960s (Gong and Tu 1991) which were restocked (e.g. lakes and reservoirs). In effect, the presence of numerous large reservoirs in this region, such as Danjiangkou, Hongshuihe, Liuyanji, Tankou, Wujiaying and Xipaizihe, made it a hotspot for fish farming in the late 1960's. This contributed to the highly admixed populations, in a period that also corresponds to the first introductions of P. parva in Europe.

Conclusion
Our study highlights the importance of understanding both old and new processes shaping the genetic structure in the native range of invasive species. These can reveal processes, which could have enhanced the invasive potential of the species allowing them to quickly respond to new challenges (small founding populations, exposure to new parasitic communities, etc.) and therefore resolve the potential invasive paradox. Thus, future studies on invasive species should employ holistic approaches encompassing the role of both genetics and plasticity in biological invasions, with extensive sampling and studies across all ranges.