Characterisation of microsatellite and SNP markers from Miseq and genotyping-by-sequencing data among parapatric Urophora cardui (Tephritidae) populations

Phylogeographic analyses of the gall fly Urophora cardui have in earlier studies based on allozymes and mtDNA identified small-scale, parapatrically diverged populations within an expanding Western Palearctic population. However, the low polymorphism of these markers prohibited an accurate delimitation of the evolutionary origin of the parapatric divergence. Urophora cardui from the Western Palearctic have been introduced into Canada as biological control agents of the host plant Cirsium arvense. Here, we characterise 12 microsatellite loci with hexa-, penta- and tetra-nucleotide repeat motifs and report a genotyping-by-sequencing SNP protocol. We test the markers for genetic variation among three parapatric U. cardui populations. Microsatellite variability (N = 59 individuals) was high: expected heterozygosity/locus/population (0.60–0.90), allele number/locus/population (5–21). One locus was alternatively sex-linked in males or females. Cross-species amplification in the sister species U. stylata was successful or partially successful for seven loci. For genotyping-by-sequencing (N = 18 individuals), different DNA extraction methods did not affect data quality. Depending on sequence sorting criteria, 1,177–2,347 unlinked SNPs and 1,750–4,469 parsimony informative sites were found in 3,514–5,767 loci recovered after paralog filtering. Both marker systems quantified the same population partitions with high probabilities. Many and highly differentiated loci in both marker systems indicate genome-wide diversification and genetically distinct populations.


INTRODUCTION
In situ parapatric divergence across an environmental gradient is expected to result in a combination of non-concordant genetic clines among loci under selection (and loci linked to them), due to different selection coefficients on these loci, and a majority of ''neutral'' loci structured by a balance between genetic drift and gene flow (Coyne & Orr, 2004). Because environmental adaptation is expected to act on few loci only, genetic divergence across the genome should create ''islands'' of divergence rather than genome-wide divergence patterns in the presence of gene flow. However, recent genomic studies of incipient species with gene flow suggest the latter (Michel et al., 2010;Renuat et al., 2013), in contradiction to expectations.
The local population structure of U. cardui in the native range is characterised by meta-population dynamics and isolation-by-distance gene flow (Zwölfer, 1979;Seitz & Komma, 1984;Eber & Brandl, 1994;Eber & Brandl, 1996;Johannesen & Seitz, 2003). However, Steinmetz, Johannesen & Seitz (2004) reported a narrow (ca. 70 km), linear genetic transition zone across contiguously distributed populations on the Jutland (Cimbrian) peninsula where genetic variance between populations on each side of the transition highly exceeded that explained by recurrent extinction-colonisation processes. The study, based on allozyme loci, identified non-concordant clines at three loci and no clines at another four. Johannesen, Drüeke & Seitz (2010) reported dominance of the same mtDNA haplotype on both sides of the transition zone that belonged to a Western European lineage experiencing population expansion. This mix of allelic distributions across the transition area suggested an in situ origin of the parapatric divergence rather than one caused by secondary contact of two allopatrically evolved populations, i.e., by vicariance. However, a precise estimate of the relative number of loci exhibiting clines as well as modes of allele frequency shifts was prohibited due to low levels of polymorphism.
Here, we report the development of 12 highly polymorphic microsatellite loci and a genotyping-by-sequencing (GBS) protocol for SNP identification with the aim of reaching first insights into genome-wide vs. island diversification among parapatric U. cardui populations, specifically, and for the study of population genetic patterns in native and introduced populations of U. cardui in general. The two marker systems are characterised by different levels of per-locus polymorphism and they differ in locus-specific recovery (amplification) per individual. Microsatellites have great applicability due to locus-specific amplification per individual, high allelic variation per locus and Mendelian inheritance. In comparison, GBS provides high-density, genome-wide (mostly) bi-allelic polymorphisms, but the recovery of specific loci (locus amplification success) may be uneven among individuals. The markers may, alone or in combination, offer helpful tools relative to the specific scientific enquiry and/or sampling strategy (Hodel et al., 2016). We quantify genetic diversity and differentiation among one population north, within and south of the transition zone, respectively, and we assess the information value of the markers systems relative to allozyme genetic divergence observed 14 years previously (Steinmetz, Johannesen & Seitz, 2004).

Microsatellites
All microsatellite loci amplified consistently in the 59 investigated U. cardui. The tetra, penta and hexanucleotide repeat lengths allowed precise binning of alleles. Both standard fragment length variation as well as intra-repeat variation, e.g., 3 bp shifts in hexanucleotide repeats, were easily discriminated. Because all loci were amplified with identical PCR conditions, other multiplex locus combinations could be possible in future studies.
We observed 5-21 alleles per locus across the three U. cardui populations ( Table 2). The expected heterozygosity per locus per population was 0.60-0.90. Locus Uc12 was differentially sex-linked: both sexes lacked heterozygotes in Vildbjerg north of the transition zone (F IS = 0.874), predominately in females in Frøslev within the transition zone (F IS = 0.389) but in males in NMS south the transition zone (F IS = 0.418) ( Table 2). F IS values of 0.50 are expected when haploid and diploid loci (or organisms) are equally represented in analysis. Loci Uc05 and Uc06 did not obey Hardy-Weinberg proportions in the population Frøslev after Bonferroni correction (P < 0.05). Linkage disequilibria were not found in the northern population but they were observed in the transition zone (Frøslev, N pairs = 6) and in the southern population (NMS, N pairs = 1) after Bonferroni correction (P < 0.05). Most LD involved the sex-linked locus Uc12. There was no evidence for stuttering or large allele drop out at any locus. The method of Evanno, Regnaut & Goudet (2005) identified K = 2 as the highest level of structure (Figs. 1A and 1B) whereas Clumpak summary results revealed an approximately equal probability for K = 2 and K = 3 (Figs. 1C and 1D), particularly when including population priors. For K = 2: mean(LnProb) = −2855.010, mean(similarity score) = 0.997; for K = 3: mean(LnProb) = −2694.350, mean(similarity score) = 0.989. Mean expected heterozygosity/population across loci was North H e = 0.788, Frøslev H e = 0.810, South H e = 0.792. All 12 loci were significantly differentiated among populations, 0.063 < F ST < 0.177, P < 0.001. Significant Table 2 Microsatellite diversity estimates for Urophora cardui, and allele size range for U. stylata. The three U. cardui populations Vildbjerg, Frøslev and Neumünster (NMS) are located north, within and south of a genetic transition zone, respectively. F IS , inbreeding index; H e , expected heterozygosity; N a , number of alleles. N a is shown as a grand total and per population (in brackets). The size range estimates for U. stylata were based on the combined results from 15 individuals sampled in Denmark and Germany (see 'Materials and Method'). Estimates of deviations from F IS were calculated with Genepop on the web (Raymond & Rousset, 1995;Rousset, 2008)   Cross-species amplification in U. stylata was successful for four loci (Uc04, Uc09, Uc11, Uc12), whereas three loci (Uc02, Uc03, Uc07) amplified inconsistently using the amplification protocol used for U. cardui. Fragment sizes of Uc03 partially overlapped with those of Uc09 (both PM1), making an evaluation of the less well amplifying Uc03 difficult. Two loci, Uc05 and Uc10 (both PM2), amplified identical fragment profiles. Future studies of U. stylata, or other Urophora spp., should take notice of re-evaluating amplification protocols and primer mix compositions.

GBS
Neither DNA preparation/extraction methods nor treatments with or without RNase affected GBS library quality and sequencing success. Although treatment with RNase significantly improved the mean ratio of 260/280 by approaching ≈1.80 (no RNase = 2.066, with RNase = 1.864; t = 9.59, 16 df , P < 0.001), the mean number of reads per pool was higher, though not significantly, for DNA without RNase treatment (Appendix S2). PyRAD analysis quantified 645,164-2,640,743 reads and 7,615-10,002 loci per individual. The mean depth of clusters per individual with depths greater than 7 was 31.7-88.1. Assembled GBS reads have NCBI accession numbers SAMN06241878-SAMN06241895. The parameter settings ''minimum coverage of individuals'' (mincov) = 10 and 14 and ''maximum number of heterozygote samples'' (maxSH) = 4, 6, 8 and 10 identified between 1,177 (14 mincov-4 maxSH) and 2,347 (10 mincov-10 maxSH) unlinked SNPs and 1,750 and 4,469 parsimony informative sites, respectively (Fig. 2). Regression analysis showed that the frequency of polymorphic sites, 0.0034-0.0040, did not increase significantly with the number of reads (t = −0.61, df = 16, P = 0.55) whereas expected heterozygosity did (t = 2.46, df = 16, P = 0.03). The latter was caused by a significant positive relationship between the number of reads and the number of loci recovered for analysis, t = 3.59, df = 16, P > 0.01. Post hoc analysis showed that this result was influenced by the three individuals with the lowest number of recovered loci (ca. 200-300 loci less than the other individuals) and implied that a plateau was reached above c. 1,000,000 reads using in our protocol. The summary results are presented in Appendix S3.

DISCUSSION
The microsatellite and GBS markers characterised in this study identified identical divisions among three parapatric U. cardui populations, thus providing evidence for genetic separation of these populations since the first observation of genetic divergence in 2001 (Steinmetz, Johannesen & Seitz, 2004). Many and highly differentiated loci in both marker systems suggest discrete genetic differences among the studied populations. Although our Miseq data did not permit making a high-resolution genome assembly of U. cardui for mapping the distribution of SNP diversification across the genome, in silco digestion of scaffolds of the tephritids C. capitata and B. tryoni with EcoR1 found similar fragment length distributions among both scaffolds and species. Hence, we predict genetic divergence among the parapatric U. cardui populations will be present in independent genome regions. Indeed, genomic studies of parapatric/sympatric populations/species have found genome-wide divergence patterns (Renuat et al., 2013;Michel et al., 2010;Feulner et al., 2015;Marques et al., 2016), which conflicted with theoretical expectations of genomic islands of divergence for local environmental adaptation in the presence of gene flow.
The two marker systems provided different locus-specific levels of genetic diversity that alone or in combination may give insights into such diverse research fields as paternity testing, hybrid status and phylogeograhy. Although the marker systems found identical population-divisions they also differed in estimates of individual affiliations, and GBS was partly sensitive to paralog filtering assumptions. The differences are partly due to different levels of locus-specific heterozygosity, where highly multi-allelic microsatellite loci increase the probability of detecting close relatives but lower the likelihood of assigning ''unrelated individuals'' to a population when the number of individuals is low relative to the level of polymorphism per locus. The lower estimate of mean genetic differentiation for microsatellites, F ST = 0.109, being half of that observed for GBS, F ST = 0.197, is related to higher per-locus variability of the former marker system. Still, F ST > 0.10 is considered high for highly polymorphic microsatellites, and similar estimates have been found in host-race studies (e.g., Kempf et al., 2009;Imo, Maixner & Johannesen, 2013).
For GBS, the probability of population assignment increased with the number of unlinked SNPs, but even the relatively low number recovered with the conservative parameter set 14-6 ( Fig. 2) identified individual memberships to three populations with high probabilities (Fig. 3). Considering that the individual GBS coverage in 1/5 lane was mostly >40, increasing the number of individuals per pool without quality loss should be possible. Standard DNA extraction procedures did not influence data quality, given adequate concentrations of DNA.
The likelihood for genome-wide differentiation and the observation of alternative sex linkage at microsatellite locus 12 might indicate that genomic rearrangements and/or sex determination are involved in diversification among the parapatric populations. Sex determination in Diptera can vary considerably (Vicoso & Bachtrog, 2015), and in the Tephritidae both male and female heterogamy is known (Bush, 1966). In U. cardui (2N = 12), where the sex-determination system is not known (Mainx, 1976), karyotype variation in the number of dot chromosomes has been found in southern Germany (2N = 13-14) (Ponisch & Brandl, 1992). Verifying rearrangements and/or sex linkage with GBS data will require more samples than used here due to the combined action of different sex linkage among populations and variable genome representations among individuals. Future studies of U. cardui from the transition area should focus on whether genomic rearrangements and sex determination interact with mating behaviour and influence rates of gene flow.