Demographic history has shaped the strongly differentiated corkwing wrasse populations in Northern Europe

Understanding the biological processes involved in genetic differentiation and divergence between populations within species is a pivotal aim in evolutionary biology. One particular phenomenon that requires clarification is the maintenance of genetic barriers despite the high potential for gene flow in the marine environment. Such patterns have been attributed to limited dispersal or local adaptation, and to a lesser extent to the demographic history of the species. The corkwing wrasse (Symphodus melops) is an example of a marine fish species where regions of particular strong divergence are observed. One such genetic break occurred at a surprisingly small spatial scale (FST ~0.1), over a short coastline (<60 km) in the North Sea‐Skagerrak transition area in southwestern Norway. Here, we investigate the observed divergence and purported reproductive isolation using genome resequencing. Our results suggest that historical events during the post‐glacial recolonization route can explain the present population structure of the corkwing wrasse in the northeast Atlantic. While the divergence across the break is strong, we detected ongoing gene flow between populations over the break suggesting recent contact or negative selection against hybrids. Moreover, we found few outlier loci and no clear genomic regions potentially being under selection. We concluded that neutral processes and random genetic drift e.g., due to founder events during colonization have shaped the population structure in this species in Northern Europe. Our findings underline the need to take into account the demographic process in studies of divergence processes.

The observed patterns of divergence may be characterized by: (a) isolation-by-distance, where spatially separated individuals are less likely to encounter and hence mate; (b) isolation-by-adaptation, where locally adapted populations produce maladaptive or unviable hybrids when faced with gene flow, including Dobzhansky-Muller models of hybrid incompatibility; and (c) isolation-by-colonization, where the path and colonization history across the seascape and barriers may continue to restrict gene flow (Nadeau, Meirmans, Aitken, Ritland, & Isabel, 2016;Orsini, Vanoverbeke, Swillen, Mergeay, & De Meester, 2013;Spurgin, Illera, Jorgensen, Dawson, & Richardson, 2014).
After the last glacial maximum (~21 kya), serial colonization and founding events along the recolonization routes have shaped the biota on the northern hemisphere (Hewitt, 2000;Taberlet, Fumagalli, Wust-Saucy, & Cosson, 1998). Re-colonization has some common features among species, such as a general loss of genetic variation with increasing latitude, but the reconstructed histories tend to be quite complex and sometimes species-specific, involving glacial refugia, isolated pockets and secondary contact, as exemplified by terrestrial plants (Francois, Blum, Jakobsson, & Rosenberg, 2008;Kyrkjeeide, Stenøien, Flatberg, & Hassel, 2014;Petit et al., 2002). Similarly, many marine species also carry clear genetic signals of post-glacial range-expansions (Jenkins, Castilho, & Stevens, 2018). During the last glacial maximum, cold-adapted fish species are believed to have persisted in Northern Europe, while temperate fish species, such as the wrasses, found refuge in the Mediterranean and the surrounding coast of the Iberian Peninsula (Kettle, Morales-Muñiz, Roselló-Izquierdo, Heinrich, & Vøllestad, 2011).
The genetic makeup of several temperate wrasse fish species follow this classical pattern of loss of genetic variation with increasing latitude, as seen for ballan wrasse, Labrus bergylta, (Almada et al., 2017) and corkwing wrasse, Symphodus melops (Robalo et al., 2012). The corkwing wrasse has emerged as a particularly interesting case due to two substantial genetic breaks, across the North Sea (F ST = 0.15) and over a narrow coastal barrier with unsuitable sandy habitats (~60 km; F ST = 0.11) in southwestern Norway (Blanco Gonzalez, Knutsen, & Jorde, 2016). In addition, as the corkwing wrasse is currently exploited as "cleaner fish" in aquaculture (Blanco Gonzalez & de Boer, 2017), this conspicuous genetic break demands clarification, in particular as individuals are translocated across the genetic break and members of the two populations can interbreed (Blanco Gonzalez et al., 2019;Faust, Halvorsen, Andersen, Knutsen, & Andre, 2018).
Genome-wide patterns of differentiation are particularly informative in elucidating if reproductive isolation is driven by directional selection (Feder & Nosil, 2010) or random genetic drift (Nielsen, 2005). Somewhat simplified, a classical strong selective sweep should display a local genomic signal, with "hitchhiking" neutral markers in proximity of the beneficial variant (Feder & Nosil, 2010).
On the other hand, isolation-by-colonization should demonstrate a global and random pattern of genome-wide differentiation, a result of the stochastic fluctuations of variant frequencies imposed by for instance a founding event (Nielsen, 2005).
While population genetic methods are typically used to investigate patterns of population divergence, analyses using demographic inference to explicitly test different scenarios of divergence are rarely undertaken (Rougemont & Bernatchez, 2018). Here, we make use of whole genome resequencing methods to analyze the divergence between populations of corkwing wrasse in Northern Europe and to investigate demographic histories and putative patterns of reproductive isolation of this rocky shore marine fish.

| Samples and genotyping
Sixty-five corking wrasses were sampled from eight coastal locations from three regions: the British Isles, western and southern Scandinavia (Table 1). Samples from southern Norway were collected by beach seine, while those from the west coast of Norway, Sweden and the British Isles were collected by fish pots, as described in (Blanco Gonzalez et al., 2016). Muscle tissues were taken from fresh or frozen specimens and stored in 96% ethanol prior to DNA extraction. Total genomic DNA was extracted with the DNeasy kit (Qiagen) or the E.Z.N.A. Tissue DNA kit (Omega Bio-Tek) and resuspending the DNA in TE buffer. The extractions TA B L E 1 Regional groupings of corkwing sampling locations by region, location, code, sampling year, latitude, longitude, sample size (n) and number of variable single nucleotide polymorphisms (SNPs) per site (minor allele count > 1) Garrison & Marth, 2012), using the following quality control criteria: (a) quality >40; (b) minimum and maximum read depth of ×4 and ×30; (c) maximum 5% missing genotypes; (d) minimum minor allele count of 3 (MAF >2%). Two data sets were made: (a) all SNPs with ancestral states and (b) a thinned data set keeping random SNPs equally spaced by 10,000 bp and excluding rare variants (MAF >2%, thinned with "-bp-space 10,000").
The ancestral allele states were inferred using whole-contig alignments between the corkwing and ballan wrasse (L. bergylta) genome assemblies (Lie et al., 2018;Mattingsdal et al., 2018) constructed by last (v923; Frith, Hamada, & Horton, 2010); both species are members of the Labridae family. First, the genomes were indexed specifying the "YASS" and "R11" options, optimizing for long and weak similarities and masking low-complexity regions.
Then, a pairwise genome-wide alignment between corkwingand ballan wrasses was made, setting minimum E-value to 0.05 and maximum matches per query position = 100. The "last-split" function was run twice to ensure 1-1 alignments. The multiple alignments were converted to bam format and SNP positions in the corkwing wrasse genome used to extract "genotypes" in the corkwing and ballan wrasse alignment using samtools and bcftools . The inferred ancestral states were manually controlled and plink v1.90b3.40 (Purcell et al., 2007) was used to annotate the ancestral state as the reference allele. Missing data were imputed and phased using beagle default settings (Browning & Browning, 2013). To elucidate demographic relationships between the populations, we searched for identical-by-decent (IBD) haplotypes inferred by beagle (Browning & Browning, 2013), which accounts for haplotype phase uncertainty.

| Population structure and admixture
Single nucleotide polymorphism-wise F ST values between populations were calculated using the F ST of Weir and Cockerham (1984) implemented in vcftools (v0.1.13; Danecek et al., 2011). Patterns of population structure were investigated by Multidimensional scaling (MDS) analysis and inbreeding coefficients using plink v1.90b3.40 (Purcell et al., 2007). Proportion of ancestry for each individual, Q, for each putative ancestral population, K, was estimated using admixture (v1.3.0; Alexander, Novembre, & Lange, 2009), making use of the integrated five-fold cross-validation scheme for 10 iterations each for K = 2-6, each using different random seed.
In an idealized diploid population, the identity-by-descent (IBD) haplotype lengths are exponentially distributed in an organism with a mean of 1/(2*generations) Morgans (Thompson, 2013).  Table 1 for sample information) were selected.
Gene flow and diversity between locations relative to geographical distance were estimated using eems (Petkova, Novembre, & Stephens, 2016), which models effective rates of gene flow using the

| Gene flow
Gene flow across the genetic break was estimated by calling discrete local ancestry using pcadmix (Brisbin et al., 2012). Individuals from the ST and EG sites defined the admixed sample (N = 16) and the remaining samples from Scandinavia used as "South" (N = 24) and "West" (N = 16) ancestors. Simplified, pcadmix uses a PCA based algorithm of phased SNPs in a sliding window projecting the admixed samples using PCA loadings from the ancestral populations and calls local ancestry. Here, we used our previously phased dense SNPs data set and specified a fixed window size of 50 Kbp (-wMb 0.05). To reduce bias introduced by artificial breakpoints by our genome assembly, we used SNPs located on contigs >N50 (461 Kbp). Fifty Kbp bins of southern origin were denoted by "0" and bins with a western origin "1". We inserted a flag "9" to signify breaks introduced by a new contig. The r function "rle" was used to count the length of consecutive "0" and "1" for each haplotype for each contig (R Core Team, 2017).

| Demographic history
Demographic histories were estimated using two Markovian coalescent methods, psmc (v 0.6.5-r67; Li & Durbin, 2011) (Mazet, Rodriguez, Grusea, Boitard, & Chikhi, 2016), so the analyses were performed separately for each of the three regions excluding possible admixed samples. In the psmc analysis, one random individual from each of the most geographically distant locations were selected (ARD15, SM111 and GF01) using a similar approach to the one described in (Barth, Damerau, Matschiner, Jentoft, & Hanel, 2017), setting minimum and maximum read depth at six and 30 and base quality >30. Then the resulting fastq files were converted to psmc input format specifying quality threshold >20. psmc was run using default parameters, followed by 100 bootstraps.
A diffusion approximation method was implemented, dadi Optimize_Routine) which sequentially refines the perturbation of parameters, was used (Portik et al., 2017). The optimization function included four rounds each with 10, 20, 30 and 40 replications, increasing maximum iterations (three, five, 10 and 15) and decreasing fold in parameter generation (3, 2, 2 and 1), resulting in 100 replications. We looped the aforementioned algorithm 10 times, yielding 1,000 local minima from the four models. The best model and its parameters were subjected to a goodness-of-fit test (Optimize_ Functions_GOF) generating simulated parameters and these were used to assess the significance of the empirical parameters (Portik et al., 2017). Coalescent parameters were converted as follows: ancestral effective population size (N e ) was calculated by N e = θ/4µl, where θ was the scaled population parameter, µ was the mutation rate per site and per generations, and l the length of analyzed sequence.
Thinning the data set to one SNP per 10 kbp effectively reduced the length of the analyzed sequence by a factor of ~18, resulting in 35 Mbps. Migration was calculated as m = M/2N e and time in years as t = 2TN e × g, using g = 3 as generation time (Halvorsen et al., 2016(Halvorsen et al., , 2017Uglem et al., 2000).

| Genotyping
The whole genome resequencing analysis generated a total of 3,048 million reads. Approximately 0.8% of these reads were du- This highly dense data set was further reduced to keeping one SNP per 10 Kbp, using vcftools ("bp-thin 10,000"), yielding a reduced data set of 50,130 SNPs, denominated as the "thinned data set".
Due to a relatively low minimum read depth filter (×4) it is likely that the proportion of heterozygous SNPs is underestimated, which can introduce a systematic error especially in windowed analyses which rely on breakpoints like IBD haplotypes (Meynert, Bicknell, Hurles, Jackson, & Taylor, 2013).

| Population structure and sequential loss of genetic variation
The number of SNPs within each sampling location suggests a pattern of sequential loss of diversity among regions, initially from the British Isles to western Scandinavia and followed by a further reduction to southern Scandinavia (Table 1) The inbreeding coefficient ("plink -het") also revealed that the EG site was somewhat less homozygous compared to the other southern Scandinavian sites ( Figure S1).

| Stochastic genome-wide differentiation
Searching for localized signals of differentiation and candidate regions of selection we explored the genome-wide pattern of variation between the two Scandinavian populations. The analysis revealed a strong and global genome-wide pattern of differentiation ( Figure   S2). Across the genome five regions showed F ST values >0.9 and 32 regions F ST >0.8. The haplotype based test, hapflk, returned a similar pattern but with more distinct candidate regions, albeit none passed the threshold for statistical significance (q-value <0.05). Testing for outliers between the western and southern populations, bayescan results yielded two significant loci possibly under diversifying selection, SYMME_00001686_632632 and SYMME_00023564_399441 ( Figure S3). The frequency of these two loci is 0.72 and 0.89 in the western population, and both loci were monomorphic in the southern population. The most differentiated SNPs can be informative in population discrimination and are listed in Table S2.

| Gene flow across the genetic break
We used the default parameters in pcadmix thereby removing SNPs in high LD (r 2 > .8) and monomorphic SNPs in the ancestral samples.

| Demographic history and founding events
The analysis of psmc and smc++ is based on the "all SNP" data set, while dadi analysis was conducted using the "thinned SNPs" The isolation with migration model was the most likely scenario for the three comparisons analyzed in dadi. Among all 2D (two populations) models, the secondary contact projection yielded the best log likelihood and AIC statistic (Table S3). Nevertheless, we observed increased residuals on the rare frequency range, suggesting difficulties in modelling the loss of variation ( Figure S5).  (Table S3). Scandinavia ( Figure S1). The loss of SNPs is dramatic, as ~700 k SNPs detected in the British Isles are reduced to ~590 k (~16% less) SNPs in western Scandinavia with a further reduction to ~450 k (~35% less) SNPs in southern Scandinavia (Table 1), suggesting the direction and sequence of possible founding events to follow the British Isles-western Scandinavia-southern Scandinavia route, as previously suggested (Robalo et al., 2012). The pattern of genome-wide divergence (F ST and hapflk; Figure (Almada et al., 2017;Evankow et al., 2019;Hoarau, Coyer, Veldsink, Stam, & Olsen, 2007;Kettle et al., 2011;Maggs et al., 2008;Quintela et al., 2016). The colonization of Scandinavia ~10 kya, coincides with the deglaciation in western Norway (Stroeven et al., 2016). The demographic history date estimates inferred by the two Markovian approaches should be considered approximations, as the simulations rely on accurate generation time, mutation rates and sex ratio (Spence, Steinrücken, Terhorst, & Song, 2018). These values are intrinsically challenging for a species like corkwing wrasse, considering the variance in reproductive behaviour and generation time displayed by the species along the latitudinal gradient covered in this study (Halvorsen et al., 2017). The fact that psmc and smc++ do not adjust for periods of gene flow between populations and assumes clean population splits demands some care when interpreting changes in effective population sizes, and validation of findings using other methods are encouraged (Beichman, Phung, & Lohmueller, 2017). The method using diffusion approximations of the joint frequency spectrum implemented in dadi is frequently used to model complex scenarios of gene flow between populations (Rougemont et al., 2017;Tine et al., 2014). Here, we only tested simple scenarios, as more complicated models (Rougeux, Bernatchez, & Gagnaire, 2017) failed to converge and tended to produce artificial fits and parameters (data not shown). Using the site-frequency spectrum (SFS), the model which best fitted the empirical spectrum of all three comparisons was the secondary contact model (British Isles vs. western Scandinavia, British Isles vs. southern Scandinavia and western Scandinavia vs. southern Scandinavia, Table S3).

| D ISCUSS I ON
The third line of evidence is in the distribution of shared haplotypes (identical-by-decent) between the populations which corroborate the findings from the demographic history ( Figure S6 In conclusion, our findings shed new light on the dynamics underlying the presence of two genetic breaks of this species in Northern Europe (Blanco Gonzalez et al., 2016;Knutsen et al., 2013;Robalo et al., 2012). It also serves to remind us that more simple scenarios involving sequential recolonization and associated founder events combined with secondary contact could underlie instances of strong genetic breaks, without having to invoke more elaborate scenarios of selection and environmental adaptation (Hewitt, 1999;Ravinet et al., 2017;Schluter & Conte, 2009). Yet, while we could associate contemporary patterns of genetic differentiation to historical demographic events rather than adaptation, isolating mechanisms between western and southern Scandinavian populations still need further clarification, including a possible polygenic model of adaptation. In conclusion, corkwing wrasse could become an interesting future model for complementing and exploring the full span of possible dynamics that can lead to distinct contact zones, ranging from selectively neutral population history and structure to strong selection.

ACK N OWLED G EM ENTS
The sequencing service was provided by the Norwegian Sequencing

AUTH O R CO NTR I B UTI O N S
The study was conceived and designed by P.E.J, H.K, S.J and E.B.G.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence reads are available through NCBI sequence read archive by accession number PRJNA354496. SNPs (Mattingsdal, 2019) can be obtained through: doi.org/10.6084/m9.figshare.7570907. v1.