Genetic Variation May Have Promoted the Successful Colonization of the Invasive Gall Midge, Obolodiplosis robiniae, in China

Invasive species often cause serious economic and ecological damage. Despite decades of extensive impacts of invasives on bio-diversity and agroforestry, the mechanisms underlying the genetic adaptation and rapid evolution of invading populations remain poorly understood. The black locust gall midge, Obolodiplosis robiniae, a highly invasive species that originated in North America, spread widely throughout Asia and Europe in the past decade. Here, we used 11 microsatellite DNA markers to analyze the genetic variation of 22 O. robiniae populations in China (the introduced region) and two additional US populations (the native region). A relatively high level of genetic diversity was detected among the introduced populations, even though they exhibited lower diversity than the native US populations. Evidence for genetic differentiation among the introduced Chinese populations was also found based on the high Fst value compared to the relatively low among the native US populations. Phylogenetic trees, structure graphical output, and principal coordinate analysis plots suggested that the Chinese O. robiniae populations (separated by up to 2,540 km) cluster into two main groups independent of geographical distance. Genetic variation has been observed to increase rapidly during adaptation to a new environment, possibly contributing to population establishment and spread. Our results provide insights into the genetic mechanisms underlying successful invasion, and identify factors that have contributed to colonization by an economically important pest species in China. In addition, the findings improve our understanding of the role that genetic structure plays during invasion by O. robiniae.

Invasive species often cause serious economic and ecological damage. Despite decades of extensive impacts of invasives on bio-diversity and agroforestry, the mechanisms underlying the genetic adaptation and rapid evolution of invading populations remain poorly understood. The black locust gall midge, Obolodiplosis robiniae, a highly invasive species that originated in North America, spread widely throughout Asia and Europe in the past decade. Here, we used 11 microsatellite DNA markers to analyze the genetic variation of 22 O. robiniae populations in China (the introduced region) and two additional US populations (the native region). A relatively high level of genetic diversity was detected among the introduced populations, even though they exhibited lower diversity than the native US populations. Evidence for genetic differentiation among the introduced Chinese populations was also found based on the high Fst value compared to the relatively low among the native US populations. Phylogenetic trees, structure graphical output, and principal coordinate analysis plots suggested that the Chinese O. robiniae populations (separated by up to 2,540 km) cluster into two main groups independent of geographical distance. Genetic variation has been observed to increase rapidly during adaptation to a new environment, possibly contributing to population establishment and spread. Our results provide insights into the genetic mechanisms underlying successful invasion, and identify factors that have contributed to colonization by an economically important pest species in China. In addition, the findings improve our understanding of the role that genetic structure plays during invasion by O. robiniae.

INTRODUCTION
Obolodiplosis robiniae (Haldeman, 1847) (Diptera: Cecidomyiidae) is a North American species of gall midge that has recently been extensively introduced throughout Asia and Europe (CABI/EPPO, 2011) and is continuously expanding its range (Cierjacks et al., 2013;Stalazs, 2014;Badmin, 2016;Kostro-Ambroziak and Mieczkowska, 2017). It is specifically associated with host plants from the genus Robinia (Fabaceae) (Stalazs, 2014). Its main host is Robinia pseudoacacia although it is occasionally found on R. pseudoacacia cv. 'Frisia' (Badmin, 2016). The gall midge causes leaf rolling and premature leaf shedding, resulting in the deterioration of the host and increased susceptibility to other pests, including wood borers such as longhorn beetles .
Obolodiplosis robiniae was first recorded in China (Qinhuangdao City, Hebei Province) in 2004  and has since spread extensively. Its primary host (R. pseudoacacia) has been planted extensively across China, and O. robiniae is now found in most of these areas (Shang et al., 2015a). Chinese O. robiniae populations may produce between four and six generations per year (Wang, 2009;Mu et al., 2010;Shao et al., 2010;Liu, 2014), which is significantly higher than the rate in regions beyond China. For example, O. robiniae produces three to four generations per year in Italy (Duso et al., 2005) and Serbia (Mihajlovic et al., 2008), and a maximum of three generations per year in Korea (Lee et al., 2009).
Genetic diversity and population structure are important factors affecting the colonization of invasive species (Amouroux et al., 2013;Horst and Lau, 2015;Zhao et al., 2015). Invasive species often exhibit low genetic diversity during founding events, as new habitats are typically colonized by only a few individuals, representing a small proportion of the allelic diversity present in the source population (Nei et al., 1975;Tsutsui et al., 2003). However, when the founding individuals originate from multiple source populations, the genetic diversity of the founder population can be relatively high (Davis, 2009). This can contribute to invasion success by facilitating local adaptation to new environments and increasing new trait diversity (Facon et al., 2006). Besides, the invaders can rapidly evolve in isolation from other individuals of the same species when they were introduced into the new environments (Lee, 2002;Launey et al., 2010).
Previously, Shang et al. (2015b) investigated the genetic variation among Chinese O. robiniae populations using a partial mitochondrial DNA cytochrome c oxidase subunit I (COI) sequence marker. However, only 10 individuals exhibiting haplotypic variation and a mere four haplotypes were detected in 560 O. robiniae samples. Thus, the genetic mechanisms behind successful invasion, the genetic structure in the process of colonization, and the phylogenetic relationships among the Chinese O. robiniae populations remain poorly understood.
Accordingly, to gain further insight into the genetic structure of the Chinese O. robiniae populations and ascertain how the species has spread widely in new regions, we used 11 microsatellite markers to analyze the genetic structure of 22 Chinese O. robiniae populations. Two native populations from the United States (US) were also assessed based on the same loci for comparison with the Chinese populations, in order to explain how genetic diversity is altered during the invasion process.

Sample Collection
We collected the gall midge larvae and pupae contained within rolled leaves of host trees growing in 22 cities across China (Figure 1). Generally, the rolled leaves were randomly picked from different trees; however, when infestation was low, individual trees were singled out for sample collection. Following collection, the rolled leaves were immediately transported to the laboratory in 60 cm × 40 cm plastic bags, in which they were maintained until adult emergence. From the samples collected at each location, 20 larger adults were selected, placed into a 1.5-mL centrifuge tube, and stored at −20 • C for subsequent DNA extraction. Additionally, eight O. robiniae adults from two regions of the United States were obtained from the Quarantine Lab at the Institute of Forest Ecology, Environment, and Protection in the Chinese Academy of Forestry. Details of the sample collection and population codes are listed in Table 1.

DNA Extraction and Microsatellite Analyses
Genomic DNA was extracted from the entire O. robiniae body following the instructions described by Zhou et al. (2007) and stored at −20 • C until needed. The 14 microsatellite loci (W3, W5, W8, W29, W31, W33, W35, W41, W46, W82, W83, W116, W126, and W132) developed by Yao et al. (2015) were initially selected to analyze the genotypes of 20 individuals per collection site (the exception being Zhengzhou, for which 16 individuals were analyzed) ( Table 1). For each sample, we attempted to amplify all 14 loci; however, after two attempts, we were unable to amplify five loci (W29, W35, W41, W46, and W116) for many individuals; thus, these loci were not used in subsequent analyses. However, we assessed the applicability of two additional loci (W6 and W107; GenBank numbers: KP260520 and KP260530) that were not characterized by Yao et al. (2015), and we detected sufficient polymorphism among the analyzed samples. Hence, a total of 11 loci were used to genotype 444 O. robiniae individuals.
(initial denaturation step); followed by 30 cycles of 94 • C for 30 s, 53 • C (W3, W5, W6, and W8) or 56 • C (the remaining loci) for 45 s, 72 • C for 45 s, extension at 72 • C for 10 min, and finally maintained at 16 • C. PCR products were examined using a DNA analyzer (Applied Biosystems, Waltham, CA, United States) and the results were analyzed using Genotyping was carried out using a 3730xl automated DNA sequencer (Applied Biosystems, Waltham, CA, United States). Alleles were scored using GeneMarker software version 2.2. (Softgenetics LLC, State College, PA, United States).

Data Analyses
Genetic diversity was estimated by basic statistical analyses including number of alleles (Na), effective number of alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and Nei's (1973) expected heterozygosity (Nei), which were calculated using GenePop software version 4.3 (Rousset, 2008); genotype number (GN), gene diversity (GD), and polymorphism information content (PIC) were calculated using PowerMarker software version V3.25 (Liu and Muse, 2005). F-statistics and gene flow for each locus across populations were performed using PopGene software version 1.32 (Yeh et al., 2018). Deviations from the Hardy-Weinberg equilibrium (HWE) based on the Markov chain algorithm (10,000 steps) and linkage disequilibrium (LD) (10,000 permutations) were also examined by GenePop software. The genetic relationships between populations were assessed using a neighbor-joining dendrogram generated by PowerMarker and Molecular Evolutionary Genetics Analysis across Computing Platforms (MEGA X) (Kumar et al., 2018).
Populations differentiation was assessed by pairwise Fst values (based on 999 permutations) and gene flow (Nm) through AMOVA (analysis of molecular variance) approach which were performed using the Arlequin program version 3.5 (Excoffier and Lischer, 2010). The analyses can estimate variance and partitioning of the within-and among-population. Genetic structure analysis was performed with 100,000 Markov Chain Monte Carlo repetitions after a burn-in period of 200,000 interactions for each group number (K) using STRUCTURE software version 2.3.4 (Pritchard et al., 2000). The number of subpopulations (K) was assumed to be from 1 to 22, without admixture and with correlated allele frequencies.
To determine the most likely number of subpopulations, the optimum K-value was obtained by calculating the K value (Evanno et al., 2005). In addition, the Mantel test was conducted using the GenALEx 6.5 program (Peakall and Smouse, 2012) to determine correlations between Nei's genetic distance [was calculated using GenePop software based on Nei (1978)] and both geographical distance (km) and altitude (m). Significance was assessed by conducting 999 permutations. Moreover, a principal coordinate analysis (PCoA) was conducted using the same software.

Microsatellite Polymorphism and Diversity
In this study, locus polymorphism and diversity were determined based on 22 Chinese populations using 11 microsatellite markers, with each population consisting of 20 individual samples (except Zhengzhou, with 16 samples) ( Table 1). Amplifying these microsatellite markers loci led to 436 polymorphic bands (Supplementary Table S1), representing 202 genotypes, ranging from 3 to 54 per primer pair. As shown in Table 2 Figure S1). In addition, when we evaluated the within-sample HWE deviations for each locus across all 22 Chinese populations using the Markov chain algorithm (10,000 steps), we detected significant deviation from the expected value (p < 0.05) in 68 of 242 tests (28.10%) (Supplementary Table S2).
Examination of genotypic LD between all pairs of alleles across all 22 populations, based on a permutation procedure (10,000 permutations), revealed a significant LD (p < 0.05) in 251 of 1210 tests (20.74%) from 11 loci in the 22 Chinese populations.

Population Genetic Diversity
The genetic diversity of 22 Chinese populations and two US populations was assessed. Six indices of genetic diversity (Na, Ne, I, Ho, He, and Nei) were evaluated. As shown in Table 3, for each Chinese population across all loci, the means of the above indices except Na were moderately or considerably lower than those of the native US populations, although the sample size of the Chinese populations (40) was markedly higher than that of the US populations (8)

Genetic Differentiation in the Chinese Populations
The Fst per locus ranged from 0.1357 to 0.3770, with an average of 0.1994, and the Fis per locus ranged from −0.0037 (W3) to 0.3364 (W82) with an average of 0.0873 alleles per locus across populations (    exists among the sampling sites and there is some restriction in gene flow between them ( Table 4). The most noticeable genetic differentiation occurred between BJ and GY (Fst = 0.377), followed by GY and QH (Fst = 0.372), then DY and QH, DY and GY (Fst = 0.361 for both). According to the coefficient of genetic differentiation (Fst = 0.1830, p < 0.001), genetic variation within populations (81.66%) was substantially higher than that among populations (18.34%) (p < 0.001) ( Table 5). Gene flow (Nm) ranged from 0.414 (BJ and GY) to 11.05 (HF and WH) ( Supplementary Table S4), with an average of 1.113 ( Table 5).
All investigated loci contributed to the population differentiation (p < 0.001 for each individual locus). Regarding the native US populations, they had a relatively low Fst value (0.085, p < 0.03), a high Nm value (2.685), and variation within populations of 91%, while variation among populations was 9% (p < 0.02). This further indicates the existence of extensive genetic differentiation among the introduced populations.

Genetic Relationships and Population Structure Analysis
A dendrogram depicting the genetic relationships among the 22 Chinese populations was constructed based on the microsatellite data (Figure 2). The populations were divided into two main clusters, and each cluster was further separated into several sub-clusters. Besides, some subdivided populations were clustered according to their spatial distribution, such as GY, HF, and WH located in the south of southern China, which clustered together in group I, whereas TA, DD, BJ, and QH located around Bohai Bay clustered together in group II. The 436 Chinese O. robiniae samples were further assessed for population stratification using STRUCTURE software. Microsatellite data were analyzed with possible cluster numbers (K-values) ranging from 1 to 22. K was clearly maximized when K = 2 ( K = 4298.3743), indicating the occurrence of two distinct groups among the 22 populations (Figure 3), which validates the dendrogram-based grouping, and the second clade was grouped according to approximate geographical area. These results suggested different degrees of introgression in the populations, detected as differences in allelic frequencies among the populations. In addition, greater structuring (K = 3) revealed that QD, GY, and TY in group I had certain structural similarities to YT, SY, NJ, YC, and XA in group II.
In addition, PCoA based on the marker genotypes also revealed two distinct clusters of the Chinese populations (Figure 4), which were partly related to their geographical regions (group II contained populations from North China). The Mantel test revealed non-significant negative correlations between Nei's genetic distance (Supplementary Table S5

DISCUSSION
In the present study, we detected a high degree of polymorphism among the assessed microsatellite loci. We also identified a relatively high level of genetic diversity among Chinese O. robiniae populations across all loci, with the average expected heterozygosity (He) and Nei's (1973) expected heterozygosity (Nei) being 0.5346 and 0.5192, respectively. The highest He was 0.6606, which occurred in the US_g population despite the fact that it consisted of only three O. robiniae individuals. Nonetheless, He, Nei (Shang et al., 2016), gene diversity index  (GD), and polymorphism information content (PIC) (Ren et al., 2014) are minimally influenced by sample size. Our results indicate significant differences in genetic diversity within the Chinese populations, as well as between the native and invasive populations. Shang et al. (2015b) detected a relatively low level of genetic diversity among Chinese O. robiniae populations using a COI marker. This discrepancy suggests that COI markers may be less suitable than microsatellite DNA markers for population analyses of a new invasive species, such as Chinese O. robiniae, as is the case with another invasive cecidomyiid, Procontarinia mangiferae (Amouroux et al., 2013).
The relatively high level of genetic diversity has likely contributed to the spread of O. robiniae across China in the past decade to the extent that this species is now established in most regions where its host exists (Shang et al., 2015a). O. robiniae has a short history in China, with initial detection occurring in 2004 , whereas its host was introduced over a century ago (Xu and Yang, 2006). Moreover, for the 22 Chinese populations analyzed, our results show a relatively high level of genetic differentiation (Fst = 0.1830) among populations from sites separated by distances of up to 2,540 km. Hence, the relatively high genetic diversity likely caused by significant differentiation among populations has been conducive to the rapid colonization and establishment of O. robiniae in China, and it is an important factor contributing to the successful invasion of O. robiniae.
The observed population differentiation likely resulted from rapid evolution during adaptation to the new environment. Invasive species may evolve rapidly in response to selection pressures driven by novel habitats (Sakai et al., 2001;Lee, 2002;Ochocki and Miller, 2017), and such rapid genetic adaptation might be important for invasive species (Shine et al., 2011) in order to increase fitness and invasion success (Suarez and Tsutsui, 2008;Roux and Wieczorek, 2009). Increasing the success of both their initial establishment and subsequent range expansion is a particularly effective strategy for introduced populations, as was shown for several invasive species during their colonization processes (Simberloff et al., 2013;Ochocki and Miller, 2017;Wu et al., 2019). Hence, differentiation among Chinese populations was likely accelerated by the rapid evolution of adaptations to the new environments, promoting successful invasion by O. robiniae.
Furthermore, high levels of genetic diversity in an invasive species might be caused by multiple introductions or large founding populations. It is thought that multiple introductions are associated with increased diversity because they supply increased variation and new genetic communities (Dlugosch and Parker, 2008). Multiple introductions are considered to produce invasive populations that are much more genetically diverse than a single source population (Sakai et al., 2001). As such, the successful establishment and invasion of many invasive species have been attributed to multiple introductions (Meixner et al., 2002;Kolbe et al., 2004;Cheng et al., 2008;Zalewski et al., 2010;Michaelides et al., 2018). For O. robiniae, in light of the short history in China, its high genetic diversity and colonization success might be also related to multiple independent invasive events.
Many studies have shown that the genetic structures of invasive species are well developed in their new ranges (Zeisset and Beebee, 2003;Herborg et al., 2007;Rollins et al., 2009;Zalewski et al., 2010). Indeed, the genetic diversity of invasive species is often higher than that of native populations (Marrs et al., 2008). In light of this, our neighbor-joining dendrogram and Bayesian STRUCTURE (K = 2) analyses indicated that the Chinese O. robiniae populations are divided into two independent clusters, although this division appears to be unrelated to geographical distribution. Meanwhile, gene flow was found among some Chinese populations, with QD, GY, and TY in group I having highly similar structures to YT, SY, NJ, YC, and XA in group II. In addition, some subgroups (GY, HF, and WH; TA, DD, BJ, and QH) were clustered according to their geographical distribution, which likely represents different routes of spread in the new environment. Furthermore, our results revealed that each group of the Chinese O. robiniae populations exhibited differences in geographical distribution and genetic distance, suggesting that the two groups do in fact represent two different sources. This implies the introduced populations likely experienced two independent invasive events, which initially shaped the genetic structure of the Chinese O. robiniae populations.
On the other hand, the fact that the division of the two groups was unrelated to geographical distribution (particularly for group I, which contained numerous genetically similar populations located in different geographical regions) suggests that human activity is likely another important contributing factor. Transport, business trips, and long-distance vacations have recently increased not only in frequency but also in distance. High levels of human-mediated dispersion can increase the genetic diversity of an invasive population, thereby substantially modifying the genetic structure and potential management units (Perkins et al., 2013); these factors decrease the success of control measures for O. robiniae populations.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the article/Supplementary Material.

FUNDING
This work was funded by the Fundamental Research Funds of the Chinese Academy of Forestry (CAFYBB2017SZ003) and the National Key R&D Program of China (2016YFC1201200).