Targeted resequencing reveals geographical patterns of differentiation for loci implicated in parallel evolution

Parallel divergence and speciation provide evidence for the role of divergent selection in generating biological diversity. Recent studies indicate that parallel phenotypic divergence may not have the same genetic basis in different geographical locations – ‘outlier loci’ (loci potentially affected by divergent selection) are often not shared among parallel instances of phenotypic divergence. However, limited sharing may be due, in part, to technical issues if false‐positive outliers occur. Here, we test this idea in the marine snail Littorina saxatilis, which has evolved two partly isolated ecotypes (adapted to crab predation vs. wave action) in multiple locations independently. We argue that if the low extent of sharing observed in earlier studies in this system is due to sampling effects, we expect outliers not to show elevated FST when sequenced in new samples from the original locations and also not to follow predictable geographical patterns of elevated FST. Following a hierarchical sampling design (within vs. between country), we applied capture sequencing, targeting outliers from earlier studies and control loci. We found that outliers again showed elevated levels of FST in their original location, suggesting they were not generated by sampling effects. Outliers were also likely to show increased FST in geographically close locations, which may be explained by higher levels of gene flow or shared ancestral genetic variation compared with more distant locations. However, in contrast to earlier findings, we also found some outlier types to show elevated FST in geographically distant locations. We discuss possible explanations for this unexpected result.


Introduction
Parallel phenotypic divergence occurs where the same traits diverge repeatedly in different geographical locations, often associated with similar environmental transitions. Parallel divergence coupled with the repeated evolution of reproductive isolation provides fascinating evidence for the role of selection as a driving force in speciation (Schluter & Nagel 1995;Elmer & Meyer 2011): When similar patterns of divergence occur repeatedly, this is more likely to be driven by a directional process than by chance.
With the availability of genomic tools, researchers have now begun to focus on the genetic basis of phenotypic divergence and speciation (Seehausen et al. 2014). For example, genome scans can identify genomic regions potentially affected by divergent selection (directly or via linkage disequilibrium) (Luikart et al. 2003;Butlin 2010). These methods utilize data from large numbers of genomic markers to identify 'outlier' genomic regions showing increased differentiation (typically measured as F ST ) compared with the rest of the genome (Lewontin & Krakauer 1973;Beaumont 2005). Outliers have been identified in multiple different study systems, including some where parallel phenotypic evolution has been described (e.g. Nosil et al. 2009). These systems are particularly interesting because they allow us to ask whether the same alleles, genes or genetic pathways underlie parallel phenotypic patterns of divergence. Answering this question will help in understanding the repeatability of adaptation at the molecular level (Elmer & Meyer 2011;Conte et al. 2012). The extent of 'sharing' of outliers among instances of parallel evolution should be indicative of the extent to which the genetic basis of divergence is the same. Studies using candidate genes or QTL mapping approaches often find reuse of the same genes in adaptation to similar selection pressures, especially among closely related species (Conte et al. 2012). However, these analyses may be biased towards genes of large effect (Rockman 2012) and candidate genes are generally more likely to appear as shared (Conte et al. 2012). In contrast to these results, recent genome scans focusing on populations within species have shown that parallel phenotypic divergence may not necessarily be accompanied by similar genetic patterns: that is, outlier loci may not be the same across different locations, even if they are geographically close (Deagle et al. 2012;Kautt et al. 2012;Gagnaire et al. 2013;Perrier et al. 2013;Roda et al. 2013;Soria-Carrasco et al. 2014;Westram et al. 2014).
The extent of sharing detected in genome scans may be influenced by both biological/evolutionary explanations and technical artefacts (Narum & Hess 2011). To understand the genetic basis of (parallel) divergence, we need to tease apart these alternatives and improve the technical aspects of the methodology to avoid false positives.
A similar genetic basis of divergence is expected especially when there is a strong connection by gene flow that may transport adaptive alleles (e.g. evolution in concert; Johannesson et al. 2010), or when there is a lot of shared standing genetic variation among instances of parallel divergence (e.g. Hohenlohe et al. 2010). In these cases, shared outliers should be identifiable in genome scans, although this method may sometimes fail when patterns of linkage disequilibrium between marker loci and the actual targets of selection differ among populations. On the other hand, low outlier sharing may be expected when the genetic variation available for adaptation is substantially different among locations. In this scenario, solutions for the same problem evolve independently and might therefore be based on different novel mutations, or on different components of old variation that have been maintained only locally. Such a pattern may be more likely for polygenic traits, which allow for a larger number of potential 'genetic solutions'. In this case, the phenotypic outcome (i.e. adaptation) may be similar, and outliers might still belong to the same metabolic pathways (Roda et al. 2013), but the markers appearing as outliers may be different. Such independent evolution is expected when instances of divergence are separated by long evolutionary distances (reducing the amount of shared standing genetic variation) and/or geographical distances (less connected by gene flow) (e.g. Conte et al. 2012). If the limited extent of outlier sharing often observed in recent studies is mainly due to these evolutionary reasons (rather than technical explanations), we would therefore expect sharing to increase when sampling geographically closer locations (unless adaptation is completely based on local novel mutation even on small geographical scales) (e.g. Roda et al. 2013).
As an alternative (but not mutually exclusive) explanation, various technical issues may affect outlier scans and therefore the observed extent of sharing of outliers. Whether a marker appears as an outlier reflects not only the effect of divergent selection, but also potentially several other factors. Regions of low intrapopulation diversity may be identified as outliers even if they do not show increased divergence between populations (Noor & Bennett 2009;Cruickshank & Hahn 2014). Features of the genome may also have an influence on F ST . For example, genomic regions of low recombination rate (e.g. near centromeres) tend to show increased F ST estimates (e.g. Roesti et al. 2012;Renaut et al. 2013), because in these regions the effects of both divergent and purifying selection spread further along the chromosome. If local recombination rate is a conserved feature, this may lead to sharing of outlier loci that is not necessarily caused by divergent selection alone.
Other factors may generate spurious outliers which are not expected to be shared among populations, and which could therefore decrease the observed amount of sharing and falsely indicate large differences in the genetic basis of parallel divergence. A proportion of outliers will often represent false positives, while some loci influenced by divergent selection will be missed. False positives may occur due to violations of the neutral model applied in outlier scans, for example deviations from the assumed population structure or demographic history (Foll & Gaggiotti 2008;Excoffier et al. 2009). Additionally, false positives may simply occur due to sampling noise (i.e. allele frequencies in the sampled individuals do not reflect population allele frequencies). The probability of misrepresentation of population allele frequencies increases with smaller sample sizes. Effects of sampling noise may also be increased when intrapopulation genetic diversity is spatially structured and sampling is limited to a few spots (as may often be the case for practical reasons). Furthermore, some studies pool DNA/RNA from multiple individuals of the same population (e.g. Roda et al. 2013;Westram et al. 2014), further reducing the confidence in the data. A large number of false positives represent a possible explanation for a low extent of observed outlier sharing because such outliers may not be expected to be the same in different locations.
Following from the above considerations, we predict the following patterns: (i) if a low observed amount of outlier sharing among distant locations is mainly due to the biological explanations outlined above, outliers may still show elevated levels of differentiation in geographically close populations (where there is more potential for gene flow and a shared evolutionary history). (ii) If low sharing is mainly due to false positives caused by sampling noise, outliers should not be repeatable (i.e. when taking new samples from the same locations, outliers should often not show elevated levels of differentiation again).
Here, we test for these patterns in a system where parallel divergence is well studied, the intertidal snail Littorina saxatilis. This ovoviviparous periwinkle has evolved morphologically distinct ecotypes, partly separated by reproductive barriers, in multiple geographical locations including Spain, Sweden and the UK (Rol an-Alvarez 2007; Johannesson et al. 2010;Butlin et al. 2014). A model of repeated divergence, within and between countries, fits genetic data better than a single origin of the ecotypes . One ecotype ('crab ecotype') is adapted to predation and occurs in crab-occupied habitats (e.g. boulder fields), while the other one occurs in crab-free environments (e.g. cliffs) and is adapted to exposure to wave action ('wave ecotype'). Parallel phenotypic divergence involves multiple traits, including body size (crab ecotype larger), shell thickness (thicker shell in crab ecotype) and behaviour (crab ecotype more wary) (reviewed in Johannesson et al. 2010). However, there are also nonparallel patterns of divergence, for example with regard to ridging of the shell (Johannesson et al. 2010). In addition to divergent natural selection, habitat choice and assortative mating contribute to reproductive isolation between the ecotypes (Johannesson et al. 1995(Johannesson et al. , 2010Rol an-Alvarez 2007;Webster et al. 2012). Multiple studies have analysed genetic divergence between ecotypes in this system, finding that gene flow between ecotypes is restricted across the genome (Grahame et al. 2006;Panova et al. 2006;Galindo et al. 2009) and detecting outliers with differentiation increased above a low background level of differentiation (Wilding et al. 2001;Galindo et al. 2010;Westram et al. 2014;Ravinet et al. 2016).
Because population sizes are large, and some populations share a recent postglacial history (Swedish and UK populations) , it seems feasible that parallel adaptation involves a partly shared genetic basis, as observed in other systems (e.g. sticklebacks; Hohenlohe et al. 2010;Jones et al. 2012). Most outlier studies so far have been limited to specific geographical regions (Spain, Galindo et al. 2009;Sweden, Hollander et al. 2015;UK, Wilding et al. 2001;Grahame et al. 2006;Galindo et al. 2010). However, a recent transcriptome scan comparing one location from each of Sweden, Spain and the UK found relatively low sharing of outliersapproximately 20% of outliers were shared among pairs of countries (Westram et al. 2014). An outlier scan based on AFLP (amplified fragment length polymorphism) markers found <10% sharing on a similar geographical scale . A recent study using RAD sequencing suggests that sharing among islands within the same archipelago in Sweden may be similarly low (Ravinet et al. 2016).
In this study, we aim to answer two main questions: (i) Do outlier loci still show increased differentiation when they are sequenced in new samples from the locations of original outlier detection? In order to gain more reliable data, we perform more extensive sequencing and obtain individual sequencing data (rather than pooled, as in Galindo et al. 2010;Westram et al. 2014). (ii) When using a hierarchical sampling design, including geographically close (within-country) and distant (between-country) samples, do outliers show elevated F ST levels in geographically closer locations?
We use a novel approach relying on resequencing of outliers detected in earlier work (see above) as well as control loci, rather than performing a de novo genome scan (which requires the sequencing of a large number of random loci across the genome). Thus, instead of detecting novel outliers, we ask whether known outliers show elevated F ST in various geographical locations. All loci were sequenced in snails from six locations, which contained (a) locations where the outliers were first detected (original locations), (b) a second (new) location in each country where outliers were first detected and (c) locations in other countries. This sampling design allowed us to answer both of our main questions.
As one specific objective was to test the confidence in previously identified outliers, we aimed at increasing the amount of sequence information per locus, rather than including a large number of loci. Towards that aim, we used a capture sequencing approach (e.g. Nadeau et al. 2012;Smadja et al. 2012;Hebert et al. 2013), which allows for the generation of longer sequences for regions for which some sequence information is already available. In this approach, genomic DNA libraries are generated and hybridized with probes targeting regions of interest, and only targeted fragments are retained and sequenced. Because the library fragments are typically longer than the probes, sequence information upstream and downstream of the probe region can be obtained.

Selection of outlier and control loci from previous studies
Various previous studies have identified outliers between ecotypes, using different methods. We included outliers from an RNAseq transcriptome scan (Westram et al. 2014), a 454 transcriptome scan (Galindo et al. 2010) and a RADseq genome scan (Ravinet et al. 2016) in the current work. These outlier types are referred to as 'RNAseq', '454' and 'RAD' outliers in the following (Table 1; Fig. 1). We also included some loci that showed gene expression differences in a microarray study but for which we did not have information about sequence divergence (M. Panova, unpublished). While we hypothesized that most transcriptome and genome scan outliers will show elevated F ST in their original location if outlier detection is reliable, this hypothesis does not apply for the genes showing expression differences. If gene expression is controlled by genomic regions distant from the gene itself, we would not expect these loci to show elevated F ST levels here.
Most of the previous studies did not cover all three countries studied here (Spain -SP, Sweden -SW, United Kingdom -UK), but used samples from multiple locations within countries (Table 1; Fig. 1). In some cases, they identified local outliers as well as outliers shared among multiple locations.
Depending on how the original study was conducted, outlier data sets consisted of contigs from the Littorina saxatilis reference genome (from one 'crab' ecotype snail from Sweden; The IMAGO Marine Genome project, http://www.cemeb.science.gu.se/research/imagomarine-genome-projects/; project coordinated by Anders Blomberg and Kerstin Johannesson), contigs from EST or transcriptome assemblies (Galindo et al. 2010;Canb€ ack et al. 2011), or RADtags (Ravinet et al. 2016. Table 1 Numbers of outlier and control loci analysed in this study. The first column indicates the original outlier scan studies. The following columns show in which locations outliers were identified (see Fig. 1 for location codes). Note that outliers shared across multiple locations were defined in varying ways. They were either detected in multiple locations independently using multiple, local outlier scans (Ravinet et al. 2016;Westram et al. 2014), or they were identified in pooled samples that contained individuals from two locations (Galindo et al. 2010). In the gene expression study (M Panova, unpublished) country-wise expression outliers were detected as loci showing an overall ecotype effect in a country-wise ANOVA. Outliers shared between Sweden and the UK were then identified as those that were significant in both countries independently. The last column shows the number of outlier/control loci followed up in this study, as well as the total number of probes designed for these loci (in brackets). The numbers of outlier loci for which we could actually obtain data in the current study (i.e. which were not discarded due to missing data) are shown in Figs S7-S10 (Supporting information) for each of the studied locations Detailed methods for outlier selection and probe design are described in Data S1 (Supporting information). Briefly, we used reference genome contigs as the basis for probe design, as these are often longer than the original outlier sequences and contain introns as well as exons. These genome contigs were identified using a BLAST search of the outliers against the reference genome. The strongest outliers (sorted by P-value or F ST estimate) which had a BLAST result in the genome were selected for probe design. We included local outliers (i.e. those appearing in a single location only) as well as outliers shared among different locations (Table 1), but analysed them separately (see below).
We also included control loci that should reflect the extent of neutral divergence between ecotypes (Table 1; Data S1, Supporting information). To obtain control loci comparable to transcriptome outliers, we obtained random sequences from the Littorina sequence database (Canb€ ack et al. 2011), which contains expressed sequences. For control loci comparable to RAD outliers, we randomly selected nonoutlier RAD loci from the study by Ravinet et al. (2016). These control loci were then processed in the same way as the outlier loci.
In the following, loci obtained from original studies and databases (i.e. both outliers and control loci) will simply be referred to as 'loci'. Note that each locus can potentially be represented by multiple reference genome contigs (e.g. due to splicing and the shortness of some of the reference contigs, see Data S1, Supporting information).

Probe design for capture sequencing
Probe design was performed by RAPiD Genomics (FL, USA) for the reference contigs selected as described above. Probes were designed to map only to a single position in the reference genome to avoid paralogs and repetitive regions and were further selected based on GC content. This resulted in a total of 2313 120-bp capture probes, representing 271 loci (163 outlier loci and 108 control loci). To obtain long sequences, there were usually multiple nonoverlapping probes per locus, separated by short gaps. Numbers of loci and probes are shown in Table 1.

Sampling of Littorina saxatilis
Littorina saxatilis snails were sampled from two locations (100-300 km apart) in each of three countries, Spain, Sweden and the UK (Fig. 1, central plot; coordinates in figure legend). To test whether outliers again show elevated F ST levels, we included the original locations where outliers were first detected (SP: S; SW: ANG; UK: T); but we also sampled a second location per country where outliers had not been tested before (SP: B; SW: OCK; UK: W), to analyse the extent to which outliers showed elevated F ST in other locations in the same country. Twelve female individuals of each ecotype were sampled, avoiding any area of habitat overlap or unclear morphological classification. This resulted in a total of 144 L. saxatilis individuals from 12 populations (six locations 9 two ecotypes). Snails were stored in 100% ethanol.

DNA extraction and sequencing
DNA was extracted from a piece of foot tissue from each individual using a protocol modified from Winnepenninckx et al. (1993). Individually barcoded libraries were prepared and fragments were captured, at RAPiD Genomics. Fragments were then sequenced on an Illumina HiSeq 2000 machine (125-bp single-end reads), generating on average 2.5 M reads per individual sample (after excluding three samples that failed).

Read mapping, variant calling and consensus generation
Reads containing adapter contamination were removed, and quality trimming (-q 20) was performed using the software CUTADAPT (Martin 2011). Sequences shorter than 40 bp after trimming were discarded.
Four individuals were excluded because of low read numbers. The remaining reads were mapped to the Littorina saxatilis genome using the program bwa-mem (Li 2013). Only primary hits (i.e. excluding shorter, split hits) mapping to the targeted reference contigs and with a mapping quality of at least 30 were retained for further analyses (these were on average 76% of the reads per sample). PicardTools (http://broadinstitute. github.io/picard/) MarkDuplicates was used to remove reads potentially originating from PCR duplicates based on identical 5 0 mapping positions.
For each individual and reference contig, a consensus sequence was generated, using the recommended pipeline in samtools (Li et al. 2009, http://www.htslib. org/doc/samtools-1.1.html). For that, first, an mpileup file (containing the mapping data for each position) was generated, and variants were called with the bcftools 'call-c' command, using bases with a quality of at least 20. From this, a fastq file was generated using the VCFUTILS 'VCF2FQ' program, keeping only positions with a minimum read depth of 20, and then converted into a fasta file using seqtk (https://github. com/lh3/seqtk). Regions of low quality or low sequencing depth, as well as regions containing indels, were replaced by N. Working with sequences (rather than extracting SNPs) allowed us to obtain a more reliable F ST estimate, as information was integrated over multiple SNPs.
The individual consensus sequences contained ambiguous bases for heterozygous sites. Two 'pseudohaplotypes' were produced by generating two copies of the consensus sequence and replacing ambiguous bases by the two component bases: that is, the base composition of the sequences was correct, while the phase (which was irrelevant for downstream analyses) was arbitrary. Sequences shorter than 100 bp were discarded, and the remaining sequences were stored in a single fasta file per reference genome contig. Finally, for each locus for which we targeted multiple reference genome contigs (see Data S1, Supporting information), the contig-wise fasta files were concatenated.

Filtering based on sequencing depth, Hardy-Weinberg equilibrium and number of SNPs
Visual inspection of the fasta files indicated that some SNPs were heterozygous in almost all individuals in one or more populations. We suspected that this might be due to the presence of duplicated loci/repetitive sequences in our data. Potentially duplicated sequences were filtered using two approaches. First, we discarded loci with unusually high sequencing depth as follows. For each individual and reference genome contig, we averaged the sequencing depth across all covered bases. We then calculated the average sequencing depth across all contigs for each individual. Then, for each individual, the value of each contig was divided by this individual average. In this way, we obtained a contig-wise value for each individual that was corrected for general differences in sequencing depth between individuals. To obtain a sequencing depth estimate for each population and contig, these individual-based values were averaged for each reference contig and population (ecotype within site). An average value of two for a given contig and population indicates that the average coverage depth for this contig is approximately twice as high as expected, which is high even when allowing for some variation between contigs. Therefore, loci associated with one or more reference contigs with an average coverage depth >2 in any of the 12 populations were discarded.
Second, SNPs with frequency differences between paralogs are expected to show negative F IS values indicating deviations from Hardy-Weinberg equilibrium (HWE). We therefore used the following procedure to filter loci with evidence for departure from HWE. SNPs were extracted from alignment files for each locus. For each SNP with data for more than five individuals in a population, the number of homozygotes and heterozygotes was determined. SNP-wise F IS was then calculated as (H e -H o )/H e , where H e is the expected heterozygosity based on allele frequencies, and H o is the observed heterozygosity. Deviations of genotype frequencies from HWE were tested for each SNP and population using the approach of Wigginton et al. (2005). The fraction of SNPs with significantly negative F IS was determined for each locus and population. In addition, the average F IS per locus was calculated, using the average H e and H o estimates across SNPs in the equation above. Loci were discarded in a populationwise manner if their average F IS was below À0.4 or above 0.4, and/or if more than 10% of SNPs showed significantly negative F IS estimates. Loci with fewer than two SNPs across individuals were also discarded as the confidence in F ST estimates should increase with the number of SNPs.

Nucleotide diversity (p)
Several population genetic statistics were calculated for each locus. The nucleotide diversity (p) is the average number of sequence differences per nucleotide position between sequences sampled from the same population. To calculate p, the number of sequence differences was counted for each possible pair of sequences from the focal population. These counts were summed over all sequence pairs and divided by the total number of nucleotide positions compared. Calculations were performed only if there were sequence data for at least six individuals in the focal population. We used the PYTHON EGGLIB library (De Mita & Siol 2012) and custom PYTHON scripts based on scripts developed by Martin et al. (2015).

d xy
The measure d xy (Nei 1987) corresponds to the average number of sequence differences per nucleotide position between sequences sampled from two different populations (here, two ecotypes from the same geographical location). We needed a measure of d xy to calculate F ST (see below).
The number of sequence differences between each possible pair of sequences from the two ecotypes was counted. These counts were summed over all possible pairs and divided by the total number of sites compared. Calculations were performed only if there were sequence data for at least six individuals in both focal populations. We used the PYTHON EGGLIB library (De Mita & Siol 2012) and custom PYTHON scripts based on scripts developed by Martin et al. (2015).

F ST
For each location and locus, we calculated F ST between the two ecotypes. Following the suggestion made by Bhatia et al. (2013), we used Hudson's estimator of F ST (Hudson et al. 1992), which is less sensitive to sample size issues than other estimators. F ST is defined as 1-(Hw/Hb), where Hw is the average number of sequence differences within populations, and Hb is the average number of sequence differences between populations. Here, this measure is equivalent to 1-(mean p/ d xy ), where mean p is the average p across the two populations, if p and d xy are calculated as explained above. We therefore simply used these values to calculate F ST .
We also calculated F ST between locations (within ecotypes), using the same approach.

F ST values of outliers and patterns within and between countries
Our approach differed from a classical genome scan as we sequenced previously detected outlier loci as well as a set of control loci, rather than sequencing loci randomly picked from the genome. Therefore, we did not aim at identifying outliers in the current work and could not apply standard genome scan software (e.g. BAYESCAN, FDIST). Instead, we used the empirical F ST distribution to ask whether outliers showed elevated F ST estimates. When we mention 'outliers' in the following, we always refer to loci detected as outliers in previous work, and not to inferences from the F ST estimates observed in the current study.
For each location, we first used the control loci, pooling those reflecting RAD loci and transcriptome loci, to obtain an expected distribution of F ST between ecotypes. We then used the 80% quantile of this distribution as a threshold to identify those outliers that showed a higher-than-expected F ST estimate. We note that this threshold is arbitrary. We picked the relatively low threshold of 80% because the number of control loci with sufficient data was relatively small (the number and identity of control loci with insufficient data differed between locations, leaving between 58 and 84 loci per location; that is, each control distribution was based on a different set of loci.). With a higher quantile, the position of the threshold is driven by a very small number of loci and may therefore be less comparable among populations. However, for completeness we report the results for the 95% quantile in Table S2 (Supplementary  Information).
Each outlier was coded in each location according to whether it fell above the 80% quantile of the respective control distribution or not. For each outlier type (RNAseq, 454, RAD, expression) separately, we then estimated the probability that outliers show elevated F ST (a) in their original location(s), (b) in a second location in the original country/countries and (c) in different countries. We set up a likelihood function assuming sampling from a binomial distribution, where n is the number of loci and P is the probability of 'success' (i.e. of obtaining a locus with F ST > 80% quantile of the control distribution). Using maximum-likelihood estimation, we determined P and obtained the log-likelihood (mle2 function in the BBMLE package in R, using the 'L-BFGS-B' method; limited-memory Broyden-Fletcher-Goldfarb-Shanno). The probability, P, equals the observed proportion of loci above the 80% quantile; the log-likelihood estimation allowed us to compare different models.
To test whether the probabilities for (a), (b) and (c) were significantly different, we used a model selection approach to compare a total of eight models for each outlier type. The simplest model assumed a single probability of 0.2 in all three cases (i.e. no parameter estimation, 'fixed' in Table 2). A value of 0.2 is equal to the chance expectation because 20% of loci are expected to fall above the 80% quantile of the control distribution by chance. The second model still forced the probability to be the same for (a), (b) and (c), but allowed for the parameter to be estimated. Additional, more complex models allowed separate probabilities for the three different cases (details see Table 2). For each model, the AIC (Akaike information criterion) was calculated and the model(s) with the lowest AIC was selected as the best fit for the data.
As mentioned above, the control distribution against which outliers were compared was generated by pooling RAD and transcriptome control loci (because the number of loci in each category was relatively small; Table 1). However, we note that the F ST distributions of these loci might differ, for example if RAD loci are less likely to be located in genic regions. We therefore compared the two F ST distributions for each location using a Kolmogorov-Smirnov test. In two cases, it was significant (OCK: P < 0.01; T: P = 0.03), with the average F ST for RAD loci being higher. To ensure that our results were not affected by this, we repeated the maximumlikelihood analysis described above, after separating RAD and transcriptome loci, testing for elevated F ST (using the quantile approach described above) when using only the same locus type to generate the control distribution. The results were qualitatively and quantitatively very similar to the ones obtained with the pooled distribution and are therefore not shown.

Data set after filtering
A total of 253 loci were retained after filtering for number of SNPs, sequencing depth and Hardy-Weinberg equilibrium (see also Fig. S2, Supporting information). On average, a locus had data in~10 (of 12) populations and an average sequence length of 1165 base pairs (Fig. S2, Supporting information). Per location, between 133 and 205 loci passed filters and had data in both ecotypes, so that F ST could be calculated.
F ST estimates between ecotypes within locations, and between locations (using control loci only) are shown in Table S1, Supporting information. As expected, F ST between ecotypes was generally lower than between locations, and F ST between locations within countries was lower than F ST between locations from different countries. Differentiation between ecotypes in the new UK location (W, west coast) could not be detected (F ST = 0, compared to 0.04 at the other UK location) despite clear phenotypic differentiation.

F ST values for outliers in their original location
One of our main questions was whether loci identified as F ST outliers in earlier studies still showed elevated F ST when studied in a new sample from the same location. Table 2 shows the probability that outliers fall above the 80% quantile obtained from the F ST distributions of control loci, estimated using maximum likelihood. The best models (those with the lowest AIC for each outlier type) are also shown in Fig. 2 (see Figs S3-S6, Supporting information, for a direct comparison of outliers with the control distribution; and Figs S7-S10, Supporting information for the numbers of loci falling above the 80% quantile for each outlier type). The 'original locations' entries reflect these probabilities for the locations where Table 2 Probabilities that outliers resequenced in samples from original and new locations fall above the 80% quantile of the control F ST distribution. Eight different models estimated the probability for outliers in the location where they were first detected (original location), a second location in the same country (new location, original country), and other countries. Models varied in whether probabilities were estimated separately for these different categories or not. Where a single probability was estimated across multiple categories, the respective cells in the table are merged. 'Fixed' indicates that probabilities were not estimated by the model, but fixed at the value expected if outliers are as likely to have elevated F ST estimates as control loci (0.2). The model with the lowest AIC for each outlier type is highlighted in bold. Any model <2 AIC units different from the model with the lowest AIC, that is with DAIC < 2, is highlighted as well as these models cannot clearly be distinguished. Numbers in brackets reflect the number of observations on which the estimation is based, that is the number of locus*population combinations for which an F ST estimate was available. (a) Outliers detected in a single location in the original study. In the case of RNAseq outliers and loci showing expression differences, this means that F ST was elevated in only one of the three/two countries tested in the original work, while 454 and RAD outliers were only tested in a single country and no information about the others was available. (b) Outliers shared between two countries in the original work. The 'original location' column shows the probability as estimated for both locations simultaneously outliers were first detected. These probabilities were very high (>0.9) for the RNAseq and RAD outliers. The probabilities for the 454 outliers and the nonshared loci showing gene expression differences were lower, but in both cases still clearly above the chance expectation of 0.2 (horizontal line in Fig. 2). In cases where multiple models obtained similarly low AIC values, these results were supported by all such models (Table 2). High probabilities were still observed when the threshold for identification of loci with elevated F ST levels was moved from the 80% quantile to the 95% quantile of the control distribution (Table S2, Supporting information).

F ST of outliers in geographically close locations
We asked whether outliers from a given location are also likely to show elevated F ST in another location within the same country. In Fig. 2, this corresponds to the 'new locations, original countries' bars. In all cases except for the shared loci showing expression differences, the probability was higher than the 0.2 expected by chance, strongly so for the RNAseq and RAD outliers and the nonshared loci showing expression differences. These results indicate that F ST of outliers is elevated not only in the location where they were originally detected, but also in other locations in the same country. This conclusion was also supported by other models with low AIC values similar to the 'best' models shown in Fig. 2 (Table 2). Furthermore, the same conclusion emerged when the 95% rather than the 80% quantile of the control distribution was used in the analysis (Table S2, Supporting information).
Elevated F ST in geographically close locations was also reflected by the fact that F ST values (across all loci) were strongly correlated between locations within countries, particularly in Spain and Sweden (Pearson's correlation coefficient: Spain -0.78; Sweden -0.71; UK -0.48; Fig. S11, Supporting information).

F ST of outliers in geographically distant locations
The 'other countries' bars in Fig. 2 indicate the probability that outliers fall above the 80% quantile in countries other than where they were first detected. For all nonshared outliers, this probability was larger than 0.2. For the RNAseq outliers, there was only a slight increase above 0.2, indicating that most outliers are country specific. In contrast, for the RAD outliers the estimated probability in a different country was the same as that for a new location in the original country (0.68), indicating that F ST of outliers may be elevated on a large geographical scale. Clearly elevated F ST in a different country for the RAD data was also supported by alternative models with similarly low AIC values (Table 2). When the 95% rather than the 80% quantile of the control distribution was used to identify outliers showing elevated F ST (Table S2, Supporting information), the probabilities were lower but still clearly above the chance expectation (0.05).

Discussion
Several recent studies have found that the proportion of outliers shared among instances of parallel divergence may be small, but the relative roles of technical issues and evolutionary explanations causing this pattern are often not clear. Here, we have tried to disentangle these issues. First, we asked whether outliers detected in earlier studies show elevated F ST when sequenced in new samples; second, we asked whether outliers show a pattern of decreasing outlier sharing with increasing geographical distance, as expected if the observed level of sharing reflects the true extent of a common genetic basis of adaptation. We found that most outliers did again show high levels of differentiation (F ST ), but the few outliers with low F ST may indicate false-positive outliers that could have decreased observed sharing among locations. However, we found clear evidence for elevated F ST of outliers in geographically close and sometimes even geographically distant locations, reflecting both recent common ancestry (Panova et al. 2011) and the potential for gene flow.
We used a capture sequencing approach to resequence outliers from previous studies as well as   Table 2). The models estimated the probability for outliers in the location where they were first detected (original location), a second location in the same country (new location, original country), and other countries. Models varied in whether probabilities were estimated separately for the different categories or not. If the best model did not distinguish between probabilities for different categories, the bars are merged here. surrounding genomic regions. Capture sequencing is a powerful approach when the number of loci of interest is too large for PCR-based approaches and some sequence information is already available (Jones & Good 2016). While the latter represents a limitation in a nonmodel organism without a reference genome, we show here that a draft genome consisting of short contigs (N50:~950 bp) is sufficient for probe design and generating relatively long stretches of sequence data. We obtained sequence information for almost all of the targeted loci (although we had to filter some out due to potential duplication, see below). The advantage of capture sequencing is that data for large genomic regions (rather than short markers; e.g. RAD sequencing) can be obtained without the need to resequence the whole genome. Longer sequences provide more information for a given locus because they often contain multiple SNPs (on average 24 in this study; average sequence length 1165 base pairs), making F ST and other estimates more reliable. The additional advantage of capture sequencing exploited in this study is that specific genomic regions of interest can be targeted, rather than, or in addition to, random markers across the genome. While capture sequencing yielded information for almost all the targeted loci, some loci had to be removed (at least locally) because they showed unusually high levels of heterozygosity. This is probably caused by repetitive regions/paralogs that are sufficiently similar to be captured by the same probe (and that were assembled into the same genome contig). In studies where pooled samples are used (or where deviation from Hardy-Weinberg equilibrium is not tested), such loci may affect the neutral F ST distribution because they introduce spurious SNPs with low differentiation. This observation emphasizes the need to confirm outliers detected in pool-seq studies using individual barcoding. In general, these loci represent an interesting phenomenon as they seem to be more prevalent in some populations (e.g. crab ecotype, ANG, Sweden; and wave ecotype, S, Spain; Fig. S1, Supporting information). This might be the result of recent duplications of larger genomic regions. There is evidence of common gene duplications and variation in copy numbers both between the geographic populations and the ecotypes in Littorina (Panova et al. 2014). These loci will be an interesting target for future studies especially given the potential role of chromosomal rearrangements and gene duplications in divergence and speciation (Lynch & Force 2000;Faria & Navarro 2010).
Using the filtered data set, we found that general patterns of differentiation between populations in control loci are in line with results from earlier studies. Differentiation between countries was relatively high, especially between Spain and the other two countries (Table S1, Supporting information), where populations probably have a more recent shared origin (Doellman et al. 2011;Panova et al. 2011). Genetic differentiation between locations within the same country was lower. Differentiation between ecotypes was generally lower than between locations and showed similar patterns to earlier studies, with ecotypes being most differentiated in Spain Westram et al. 2014).
Regarding our first main question, we show that the probability that outliers exhibit elevated F ST in their original location is not always 100% (Fig. 2), which is not surprising. First, methods of outlier detection differed among studies, two of them (Galindo et al. 2010;Ravinet et al. 2016) using model-based approachesthe softwares WINKLES (Wilding et al. 2001) and FDIST (Beaumont & Nichols 1996) and one using the empirical F ST distribution (Westram et al. 2014). These methods differ in the underlying assumptions and sensitivity (e.g. Storz 2005). Maybe most importantly, observed allele frequencies (both in previous work and in the current study) were simply subject to sampling noise due to small sample size, which affects the reliability of outlier scans (Vilas et al. 2012). This may be even more true for studies where pooled samples were used, and individuals may contribute differently to the pools (Galindo et al. 2010;Westram et al. 2014). Similarly, two of the earlier studies (Galindo et al. 2010;Westram et al. 2014) were based on transcriptome data, with noise introduced by expression variation among individuals, and potentially between different alleles at the same locus. The capture sequencing and analytical approaches used here were expected to provide more robust estimates of differentiation than previous studies and yet outlier repeatability was high. Outliers that did not show high differentiation in the current work may therefore represent false positives. These are not expected to be shared and as a result may reduce the observed amount of outlier sharing.
We suggest that replication can be an important means for estimating the reliability of outliers. Many studies use replicates in the form of pairwise samples from several different geographical locations and define outliers not shared among comparisons as false positives (e.g. Bonin et al. 2006). We believe that this method is suitable to increase confidence in outliers shared among locations and may be appropriate especially in systems where replicates are geographically close and where high sharing among locations is expected. However, it will not be appropriate for confirming location-specific outliers, or for understanding the relative contribution of shared vs. local outliers to the overall amount of divergence. These points can only be targeted using replicates from the same location.
Despite some outliers with low F ST , overall the RNAseq and RAD outliers show probabilities of >0.9, and the 454 outliers of 0.7, to have elevated F ST in the original location (i.e. to fall above the 80% quantile of the control distribution; Fig. 2). These results indicate that genuine outliers could be detected quite reliably in the original work despite limited sample sizes, pooled sequencing and the use of transcriptome data in some of the studies. Additionally, patterns of differentiation in short markers in the original studies (e.g. RAD tags; Ravinet et al. 2016) are recovered when studying longer stretches of sequence. The repeatability of high differentiation patterns represents evidence that at least in the L. saxatilis system, the low sharing of outliers between countries is unlikely to be solely caused by a large number of false positives caused by sampling effects. It also indicates that F ST patterns are temporally stable over several years.
However, it is important to note that loci not directly relevant for ecotype divergence could also show repeatable patterns of high divergence. For example, outliers mainly caused by low recombination rates in combination with selection on distant linked loci are probably repeatable. Further studies of demographic history and variation of recombination rate across the genome will be necessary to rule out these types of effects. Ultimately, confidence in outliers can only be gained if independent signatures of selection can be detected, they can be associated with divergent traits (e.g. via QTL mapping or, ideally, via functional manipulation), or if selection acting on them can be tested more directly (e.g. experimentally; Soria-Carrasco et al. 2014).
In addition to outliers from genome scans, we also followed up loci showing expression differences between ecotypes in Sweden and the UK (Panova et al. unpublished). The nonshared loci often had high F ST in the original location, similar to the genome scan outliers (Table 2, Fig. 2). This may indicate that in some cases regulatory elements may be closely linked with the coding sequence. However, this does not seem to be a general pattern as the shared loci did not have high F ST estimates (Table 2, Fig. 2). In these cases, gene expression differences may be caused by trans-regulatory elements remote from the expressed sequence and therefore probably not included in our targets. A similar pattern has been found, for example, for the divergence between dwarf and normal whitefish, where the extent of expression differences is not correlated with the amount of sequence divergence at the same locus (Jeukens et al. 2010).
Our second main question was whether outliers showed elevated F ST in geographically close and distant locations. Because the RNAseq, 454 and RAD outliers were identified in different ways in the original work and produced different results here, we consider them separately in the following discussion.
The RNAseq outliers were the group of loci for which we could perform the most powerful analysis because the number of observations was much larger than for the other marker types (see Table 2, Figs S7-S10, Supporting information). The study identifying the RNAseq outliers (Westram et al. 2014) differed from the other two outlier scan studies in that loci were tested in all three countries (Spain, Sweden and UK; see Table 1). Therefore, there were clear expectations: Nonshared outliers should show low F ST estimates in both other countries, and outliers shared between two countries should show low F ST in the third country. In accordance with these expectations, we found that the RNAseq outliers, while having elevated F ST values in the country/countries where they were first detected, exhibited only slightly increased probabilities for elevated F ST in the other country/countries (Fig. 2). In contrast, RNAseq outliers had a probability of about 0.8 of showing elevated F ST in locations within the same country, where they had not been tested before (Fig. 2). These results are consistent with a similar genomic basis of divergence on a small (but not a large) geographical scale. They also suggest that the low levels of outlier sharing among countries observed earlier (Westram et al. 2014), and supported in the current study, cannot be explained by a large number of false positives due to sampling effects. If such effects played a large role, we would not expect to find any geographical pattern; instead, outliers should rarely show elevated F ST in any other location.
However, for the 454 outliers, which should mostly be located within expressed genes similar to the RNAseq outliers, we obtained a different result. The probability of showing elevated F ST in a second location in the original country was relatively low (0.38), and the same as that estimated for other countries (Fig. 2). As we expected elevated F ST within the same country, but not in other countries (as observed for the RNAseq outliers), this result is slightly surprising. However, we note that the number of loci in this analysis was much smaller than for the RNAseq outliers. It is possible that because of this, by chance, we included a relatively high proportion of loci that are location specific.
The RAD outliers showed an elevation of F ST not only in geographically close but also distant locations (Table 2, Fig. 2). This result is surprising especially in so far as most of these outliers were not shared among Swedish locations in the original study, and so were not expected to show elevated F ST on an even larger geographical scale. However, here we used a relatively relaxed criterion to identify loci with 'elevated F ST ' (loci that fall above the 80% quantile of the neutral F ST distribution) compared with the original study, which may explain why loci can show elevated F ST across large geographical scales without appearing as shared outliers in the original study (Ravinet et al. 2016) (also see discussion of this point below). The result of high F ST even in other countries might also partly be explained by the fact that these outliers are less focussed on coding regions. Markers in regions with low gene densities might also be in regions with lower recombination rates (Flowers et al. 2012), making elevation of F ST on large geographical scales more likely because associations with selected alleles are more likely to be maintained.
Taken together, the results from the different outlier types suggest that outliers frequently show elevated F ST on small, and sometimes also on large geographical scales. This contrasts with work in other systems where outliers were often not shared among instances of parallel evolution, for example in stick insects (Soria-Carrasco et al. 2014) and groundsels (Roda et al. 2013). While these differences between study systems may be explained by taxon-specific patterns of gene flow and evolutionary history, or by differences in the geographical sampling scheme, it is more surprising that our current results seem to contradict earlier work on L. saxatilis, where outlier sharing was low both on small (within archipelago; Ravinet et al. 2016) and large (between country; Westram et al. 2014) geographical scales. The discrepancies between studies may partly be explained by the fact that here we did not try to identify outliers using a stringent threshold, but rather asked whether known outliers showed elevated F ST in other geographical locations. In the earlier studies, outlier loci may frequently have shown high differentiation in a second location but not met the threshold for outlier detection. One reason for such patterns could be the position of these loci in regions of low recombination, where they may be affected by selection on multiple linked loci (Roesti et al. 2012), but only appear as strong outliers locally. Alternatively, if divergent traits are polygenic and a lot of loci are affected by divergent selection, not all of them may appear as statistically significant outliers at any given point in time, which might lead to differences among locations if a stringent criterion for outlier detection is used. Many of the traits responding to divergent selection in L. saxatilis are probably highly polygenic (e.g. body size and shape, which show continuous variation across the hybrid zones between ecotypes; e.g. Hollander et al. 2015).
Our results indicate that, even though there is evidence that ecotypes have evolved repeatedly within countries and within regions ), divergence may partly be based on the same genomic regions. This may be due to shared standing genetic variation potentially predating the colonization of the different areas, as well as gene flow among locations. Sharing of outliers, sometimes even on large geographical scales, has been observed in other system, for example for the divergence between freshwater and marine sticklebacks, where the same genomic regions are involved in divergence repeatedly (Hohenlohe et al. 2010;Jones et al. 2012). Even though gene flow between locations in L. saxatilis is probably low, especially given that this species is ovoviviparous and has no pelagic larval stage (Reid 1996), small amounts of gene flow may be sufficient to transport strongly adaptive alleles among locations (Morjan & Rieseberg 2004). Additionally, L. saxatilis populations are typically very large; therefore, the maintenance of a large amount of ancestral genetic variation seems conceivable. We did not find a pattern of more negative Tajima's D (Tajima 1989) for outliers compared with control loci (Figs S12-S15, Supporting information), which is consistent with adaptation from standing genetic variation as well as variation introduced by past gene flow, but not with hard sweeps of recent novel mutations or alleles introduced by recent gene flow (Nielsen 2005;Barrett & Schluter 2008).
Both shared old genetic variation and gene flow are expected to be higher among geographically close populations, likely explaining the differences between close and distant populations that we observe mainly for the RNAseq outliers. Additionally, or alternatively, selection pressures could also be more similar among geographically close populations, which should increase the proportion of shared outliers. For example, in sunflowers, species pairs diverging across a similar environmental gradient show more similar patterns of divergence than species pairs diverging across different gradients (Renaut et al. 2014). In L. saxatilis sampling sites in the UK, the crab ecotype is located in the mid shore and the wave ecotype in the high shore. In Sweden, such a tidal pattern is absent. Therefore, any outliers generated by selection along this axis (e.g. those conferring adaptation to desiccation resistance) could potentially be shared among UK locations, but are not expected to be shared with Swedish locations (Westram et al. 2014).
In addition to the geographical patterns of outliers, their biological role is of interest. In the future, the availability of an improved version of the L. saxatilis reference genome and of more information for related mollusc species will allow for a detailed investigation of the molecular functions of outlier loci. This will help in answering, for example, the question whether similar functions are fulfilled by different loci in different geographical locations. This study and previous work already allow for some insights. A previous transcriptome scan identified genes involved in shell calcification, foot muscle operation and energetic metabolism (Galindo et al. 2010). Here, we annotated the outlier loci again showing very high levels of F ST (above the 95% quantile of the control distribution) in the current study (methods described in Data S1, Supporting information). These loci may be particularly good candidates for further in-depth study, and their annotations are shown in Table S3 (Supporting  information). For example, one of the outlier loci showed similarity with proteins of the family of alpha carbonic anhydrases. These enzymes are known for their role in the biomineralization of calcium carbonate structures (Moya et al. 2008;Le Roy et al. 2014) like mollusc shells. Carbonic anhydrase secreted by the mantle tissue is responsible for the hydration of CO 2 and the deposition of calcium carbonate on the shell (Le Roy et al. 2012). This function could be under divergent selection between the crab and wave ecotypes in Spain (Table S3, Supporting information), promoting differences in shell structure.
Other outliers are more difficult to assign to specific functions in the barriers to gene flow between these ecotypes, although some functions overlap between outliers (i.e. translation initiation factors, serine-threonine phosphatases) and therefore should be investigated in future studies.
Future work will also link the shared and nonshared outlier regions with phenotypes to understand their role in divergence. Furthermore, with improved genomic resources, we will be able to narrow down the actual targets of selection and study outlier loci within their genomic context. J.G. and R.K.B. developed the idea and A.M.W. and R.K.B. designed the study. A.M.W., J.G. and R.K.B. collected samples. A.M.W., M.P. and R.K.B. analysed the data. A.M.W., J.G., M.P. and R.K.B. wrote the manuscript.

Data accessibility
Coordinates of capture probes in the Littorina saxatilis reference genome, raw sequencing reads and alignments, as well as scripts for calculating p, d xy and F ST and performing the statistical analysis, are available on Dryad (doi: 10.5061/dryad.gd4dc).

Supporting information
Additional supporting information may be found in the online version of this article.
Data S1. Selection of outlier loci. Table S1. Average F ST estimates across control loci (AESD) between sites and between ecotypes. Table S2. Probabilities that outliers sequenced in samples from original and new locations fall above the 95% quantile of the control F ST distribution. Table S3. Outliers from earlier studies that showed highly elevated F ST (above the 95% quantile of the control distribution) in the current study, and that could be functionally annotated.          (Ravinet et al. 2016) falling above the 80% quantile of the F ST distribution for control loci in the current study.       Panova et al. (unpublished).