Drift versus selection as drivers of phenotypic divergence at small spatial scales: The case of Belgjarskógur threespine stickleback

Abstract Divergence in phenotypic traits is facilitated by a combination of natural selection, phenotypic plasticity, gene flow, and genetic drift, whereby the role of drift is expected to be particularly important in small and isolated populations. Separating the components of phenotypic divergence is notoriously difficult, particularly for multivariate phenotypes. Here, we assessed phenotypic divergence of threespine stickleback (Gasterosteus aculeatus) across 19 semi‐interconnected ponds within a small geographic region (~7.5 km2) using comparisons of multivariate phenotypic divergence (PST), neutral genetic (FST), and environmental (EST) variation. We found phenotypic divergence across the ponds in a suite of functionally relevant phenotypic traits, including feeding, defense, and swimming traits, and body shape (geometric morphometric). Comparisons of PSTs with FSTs suggest that phenotypic divergence is predominantly driven by neutral processes or stabilizing selection, whereas phenotypic divergence in defensive traits is in accordance with divergent selection. Comparisons of population pairwise PSTs with ESTs suggest that phenotypic divergence in swimming traits is correlated with prey availability, whereas there were no clear associations between phenotypic divergence and environmental difference in the other phenotypic groups. Overall, our results suggest that phenotypic divergence of these small populations at small geographic scales is largely driven by neutral processes (gene flow, drift), although environmental determinants (natural selection or phenotypic plasticity) may play a role.

diversity over generations (i.e., genetic drift) may also lead to phenotypic divergence (Lynch, 2007). While less noticeable across large heterogeneous landscapes, genetic drift may play an important role in early stage divergence, especially in small populations with limited gene flow (Wright, 1982).
Studies assessing adaptive divergence and the mechanisms of phenotypic divergence are most commonly conducted across large spatial (Belinda & Sgrò, 2017; and environmental (Belinda & Sgrò, 2017;Wilson, Peters, & McCracken, 2012) scales, where populations are expected to be under some degree of natural selection or dispersal limitation. Such large-scale studies clearly indicate that natural selection across environmentally heterogeneous landscapes, often with nonrandom dispersal, facilitates phenotypic divergence, whereas local processes are often less considered (reviewed in Richardson, Urban, Bolnick, & Skelly, 2014).
However, different evolutionary mechanisms do not act in isolation in natural populations, and the relative contribution of drift needs to be assessed to understand the sources of phenotypic differentiation (Clegg & Phillimore, 2010). Whereas the effects of natural selection and gene flow are more apparent in larger populations across heterogenous landscapes, drift is likely to be a particularly important mechanism in driving phenotypic divergence of small populations harboring low genetic diversity, concomitant to small spatial scales where environmental conditions are likely more homogenous (Hallatschek, Hersen, Ramanathan, & Nelson, 2007;Mayr, 1963;Nei & Tajima, 1981).
One way to assess the relative contribution of natural selection and neutral processes (i.e., gene flow and drift) on phenotypic variation, within and among natural populations, is to compare quantitative trait (QST) and neutral genetic (FST) variation (Leinonen, McCairns, O'Hara, & Merilä, 2013). If QST > FST, then divergent natural selection is likely driving phenotypic divergence. If QST = FST, genetic drift is assumed to play a primary role, whereas if QST < FST, stabilizing selection is believed to be at play (Brommer, 2011;Leinonen et al., 2013;Spitze, 1993). Moreover, comparing population pairwise QST and FST estimates in relation to population pairwise estimates of ecological variation (EST) allows insight into the relative roles of divergent natural selection and neutral processes (selection vs. gene flow vs. drift) influencing phenotypic variation within and among populations (Hangartner, Laurila, & Räsänen, 2012;Kaeuffer, Peichel, Bolnick, & Hendry, 2012). Estimating QSTs requires rigorous assessment of additive genetic variance (c) and narrow sense heritability (h 2 ), typically using quantitative genetic breeding designs, which are often not measurable for natural populations. PST, a phenotypic variance-based measure of divergence, provides a field-based proxy for QST (Brommer, 2011). Although not typically used in this context, PST estimates can also provide initial insights to the role of the environment (phenotypic plasticity and natural selection) on phenotypic variation when compared to ESTs (Kaeuffer et al., 2012).
While PST estimates are not as robust as QSTs, they do offer a means to assess phenotypic divergence of natural populations and have been applied in several studies (Kaeuffer et al., 2012;Leinonen, Cano, Makinen, & Merilä, 2006;Raeymaekers, Houdt, Larmuseau, Geldof, & Volckaert, 2007;. However, existing PST-FST comparisons are typically based on univariate traits (meristic or traditional morphometric measurements), which do not account for the multivariate trait complexities that are involved in phenotypic expression, such as homeostasis or canalization (Forsman, 2014). Multivariate statistics are routinely used in ecology and molecular ecology to characterize total variation within and among populations, including geometric morphometrics (Mitteroecker & Gunz, 2009), genetic differentiation (Hartl & Clark, 1997), and species diversity (Seymour, Deiner, & Altermatt, 2016). Applying a multivariate PST-FST comparison offers a more complete assessment of organism level phenotypic variation as opposed to comparing variation across several univariate phenotypic traits (Forsman, 2014;Spitze & Sadler, 1996).
Here, we implement PST-FST and PST-EST comparisons using a multivariate approach on threespine stickleback (Gasterosteus aculeatus) inhabiting a small geographic area in Iceland (Seymour, Räsänen, Holderegger, & Kristjánsson, 2013). Freshwater threespine stickleback ( Figure 1) are well-suited for such studies because they frequently occur in a range of water bodies that differ in connectivity, size, and environmental conditions. Moreover, phenotypic divergence of threespine stickleback across a range of morphological and life-history traits can occur within only a few generations (Barrett et al., 2011;Bell, 1982;Kristjánsson, 2005). Icelandic threespine stickleback are diverse (Kristjánsson, Skulason, & Noakes, 2002;Lucek, Kristjánsson, Skúlason, & Seehausen, 2016;Ólafsdóttir, Snorrason, & Ritchie, 2007), partly due to the high diversity of Icelandic freshwater systems, caused by the interplay of glaciation and volcanic activity (Thorarinsson, 1979). We recently showed that neutral population genetic structure of threespine stickleback across a small pond system (Belgjarskógur, NE Iceland; Figure 2) is influenced by pond isolation and periodic connectedness, due to periods of flooding and drought facilitating or constraining gene flow (Seymour et al., 2013). Belgjarskógur was presumably formed within the last 2,300 years following a volcanic eruption in an area where lava fields-in close connection with groundwater-have created a complex wetland and pond landscape (Einarsson, 1982). Estimates of effective population size (Ne) further indicate that threespine stickleback populations in this area are generally small (Seymour et al., 2013, see below), making this system well-suited to investigate the relative role of environmental selection/plasticity, gene flow, and drift in phenotypic divergence across small spatial scales.
F I G U R E 1 A representative threespine stickleback from a Belgjarskógur pond Here, we assessed the relative contribution of environmental and neutral processes in threespine stickleback within the Belgjarskógur pond system. First, we measured the extent of phenotypic diversification of threespine stickleback across 19 study ponds, focusing on three multivariate meristic/morphometric trait groups (defense, feeding, and swimming traits) as well as body shape using geometric morphometric. Second, to infer whether selective or neutral processes influenced phenotypic divergence, we assessed the relationship between each set of phenotypic traits and neutral genetic divergence using PST-FST comparisons. Third, we investigated the potential effects of environment (influencing both natural selection and phenotypic plasticity) by comparing PSTs to environmental differences (ESTs).

| Study system
Belgjarskógur, northeast of lake Mývatn, Iceland (Figure 2), is a geologically young wetland system (Thorarinsson, 1979) consisting of over a hundred lakes and ponds of various sizes (from a few square meters to ~105,000 m 2 ) on a small geographic scale (~7.5 km 2 ). The system was formed within the last 2,300 years, after lava from the Þrengslaborgir eruption flowed over a large part of the former Lake Mývatn (Einarsson, 1982). Most of the ponds in Belgjarskógur are inhabited by threespine stickleback (Figure 1), which are expected to have colonized the pond system from a single source, most likely from Lake Mývatn, or from the same ancestral source as the lake, followed by the division into many subpopulations.
Our previous study found variable genetic structuring among 19 of the ponds (pairwise FSTs across ponds ranged from 0.007 to 0.141), which was correlated with landscape connectivity associated with periodic flooding events ( Figure 2; Seymour et al., 2013).
Specifically, threespine stickleback in the western part of the system ( Figure 2) showed stronger genetic structure, whereas threespine stickleback in the eastern part were more admixed. The studied ponds ranged from 608 to 105,000 m 2 in size, and estimates of effective population size (N e ) indicated small populations (N e = 12-86; Seymour et al., 2013) and high potential for local genetic bottlenecks. The collection process was repeated daily until at least 30 adult size (>30 mm in total length) threespine stickleback were caught in each study pond. Upon capture, threespine stickleback were euthanized using an overdose of phenoxyethanol, measured for total length (to the nearest 0.1 cm), and then frozen for later morphological and genetic analyses. After eliminating individuals due to physical deformations or destroyed feeding structures (due to freezing and storage), the final sample size was 15 to 47 individuals per pond (3-24 males F I G U R E 2 Map of the study area in Belgjarskógur, Iceland, showing 19 study ponds harboring threespine stickleback. The blue hatched areas indicate the study ponds. Top left insert shows an outline of Iceland and the geographic location of Belgjarskógur area represented as a black dot and 7-30 females per pond, total N = 670). Removal of deformed animals was done to avoid errors in subsequent analyses that would not reflect natural population variation.

| Environmental measurements
To characterize the environment that threespine stickleback experience, we recorded water chemistry parameters, pond size, and macroinvertebrate community diversity. Water chemistry was assessed for each pond using a YSI 600 XLM multiprobe sonde (YSI Incorporate) to measure pH (±0.2 units) and total dissolved solids (TDS; ppm). Chemistry was measured once in late June 2009 at each study pond. Total dissolved solids (TDS) were log 10 transformed for statistical analyses (see below). To assess pond size, surface area of each pond was estimated from high-resolution (2.5-m pan-sharp-

| Invertebrates
To describe the feeding environment of threespine stickleback within each pond, invertebrate samples were collected just below the water surface and above the pond bottom by moving a hand net (153 micron mesh) for three minutes back and forth while moving forward at a slow pace. Samples were then rinsed and stored in 70% ethanol for later identification. All invertebrates retained in these samples were identified to various taxonomic levels (phylum: Mollusca, Nematoda; class: Acari, Coleopteran larvae, Collembola, Copepoda; order: Cladocera, Hymenoptera, Trichopteran larvae; family: Aphidoidea, Chironomidae larvae) using a Leica MZ12 (Nussloch) stereo microscope. If a given sample consisted of 200 or fewer individuals, all individuals were identified. If a given sample consisted of more than 200 individuals, the sample was thoroughly mixed and divided into aliquots. All individuals within an aliquot were then counted and the number of individuals in that aliquot multiplied by the total number of aliquots. If the number of individuals in the first aliquot was less than 200, the next aliquot was counted, and so on, until there were more than 200 individuals prior to calculating the total number. For statistical analyses, invertebrates were classified into three groups: limnetic (Copepoda, Cladocera), benthic (Mollusca, Nematoda, Acari, Coleopteran larvae, Chironomidae larvae, Trichoptern larvae), and "other" (Collembola, Hymenoptera, Aphidoidea) following (Schluter & McPhail, 1992).

| Stomach content
To characterize diet, threespine stickleback stomachs were extracted from 670 individuals across the 19 study populations (10-30 individuals/pond). Stomachs were opened and contents scraped out with forceps to insure all contents were removed. All invertebrates retained in these samples were counted and identified to the lowest possible taxonomic level, using the same groups as above, using a Leica MZ12 stereo microscope. Partially digested items were not always identifiable and softer bodied prey (such as copepods or worms) were rarely found, whereas shells of Cladocera, Mollusca, and Chironomidae larvae, which take longer to digest, were more frequent. For statistical analyses, prey items were classified into limnetic, benthic, and other using the same groupings as for the pond invertebrate sampling.

| Phenotypic variation
Threespine stickleback phenotypic variation was characterized by measuring linear morphometric traits or counts of meristic traits, as well as using geometric morphometrics to assess body shape. For phenotypic analyses, threespine stickleback were thawed, their stomachs removed, sexed by examining gonad morphology, and preserved in 70% ethanol, and subsequently fixed in 5% buffered formalin for three weeks. Prior to further processing, the formalinfixed fish were rinsed with water and transferred to 70% ethanol.
For geometric morphometric analyses of shape, each formalin-fixed fish was pinned onto a wax board, with spines spread out and mouth closed, and photographed on the left side using a Nikon (Tokyo, Japan) Coolpix 4500 digital camera (4 megapixels, Figure 1). Fish were subsequently bleached (1:1 ratio of 3% H 2 O 2 and 1% KOH) and stained (Alizarin red 1% KOH) to aid measurement of traditional morphometric and meristic traits.

| Feeding morphology
For morphometric/meristic measurements of feeding traits (gill raker number, gill raker length, gill raker gap width), the first gill arch from each stained fish was removed, placed between two glass plates to assure a straight position, and photographed using a Nikon Coolpix 4500 (Nikon) mounted onto Leica MZ12 stereo microscope. Total gill raker number (GRN) across the long and short gill raker arcs, the length of the first five gill rakers on the long arc (GRL) and the width of the gaps between these five gill rakers (GW) were measured (to the nearest 0.1 mm) from the gill raker pictures using the public domain ImageJ software (Abramoff, Magelhaes, & Ram, 2004).
Individual average GRLs and average GWs were then calculated for subsequent analyses.

| Swimming morphology
Swimming morphometric traits were assessed by measuring the length of the pectoral, anal, caudal, and dorsal fins (to the nearest 0.1 mm) using an ocular micrometer mounted on a Leica MZ12 stereo microscope and counting the number of fin rays on each fin.

| Defense morphology
Defense traits were characterized by counting the number of lateral plates from the left side of the fish and measuring the length of both long dorsal spines (to the nearest 0.1 mm) using an ocular micrometer. Average spine length was used in subsequent statistical analyses.

| Body shape
Body shape of each fish was assessed using geometric morphometrics using the program suite TPS (Rohlf, 2006). Before digitizing landmarks, photographs of the fish were randomly ordered and 11 fixed landmarks digitized on each photograph using the program TpsDig2.
Landmarks were selected based on key morphological characteristics, focusing on the head (to capture trophic morphology) and fins (to capture locomotion morphology). Individuals were not included in these analyses if the mouth of the fish was open or the image was of poor quality, which would have reduced the accuracy of placing landmarks. This resulted in 18 fish being removed from the analysis (final N = 643). Digitized fish were aligned using generalized Procrustes superimposition using the geomorph package in R (Adams & Otárola-Castillo, 2013). Briefly, a generalized Procrustes analysis (GPA) aligns all specimens based on position and angle and scales each specimen to a common size. The final Procrustes coordinates for each fish represents the size-free shape of each fish (Rohlf & Bookstein, 2003). A PCA of the Procrustes coordinates was then calculated following Caumul and Polly (2005), which were used as the body shape matrix for the corresponding multivariate shape PST statistics detailed below.

| Population genetic variation
Neutral genetic variation was assessed using 12 microsatellites (average allelic richness 2.7-4.04) to assess dispersal patterns in relation to landscape cover as previously reported (Seymour et al., 2013).
These analyses found strong indications of isolation by distance, as determined through least cost pathways through interpolated floodplains. The average pond pairwise FSTs was 0.084 (Seymour et al., 2013), and we use here use pond pairwise FSTs from Seymour et al.
(2013) as a measure of neutral genetic divergence.

| Statistical analyses
All statistical analyses were performed in R (version 3.5.1, R Development Core Team, 2018). Morphometric measures that correlated with body size (i.e., GRL, GW, gill raker arc length, spine, and fin lengths) were size corrected following Reist (1986) by calculating size-corrected residuals for each trait. Size-corrected residuals were then used as response variables in further statistical analyses for these traits.

| Phenotypic divergence
We first assessed morphological variation among all ponds using multivariate analyses of variance (MANOVA). For each MANOVA, we assessed the effect of pond identity (19 levels) and sex (male and female) on the response trait matrix (feeding, swimming, and defense) using the lm function in R. Because of the small sample size for males or females in some populations, we did not test for the population and sex interactions. Subsequently, we assessed individual trait variation among populations using ANOVAs, which are a special case of linear regression, via the ANOVA function in R, with pond identity and sex as explanatory variables. Response variables included body length, pectoral fin length, dorsal fin length, caudal fin ray number, pectoral fin ray number, GRN, GRL, GW, dorsal spine length, and number of lateral plates. Length measurements were size corrected as previously mentioned. Geometric morphometric shape variation across the ponds was assessed using Procrustes ANOVAs with the procD.lm function in the geomorph package, with Procrustes coordinates as response variables and pond identity and sex as the explanatory variables, with 1,000 residual randomized permutations. Procrustes ANOVA allows for statistical assessment of the term (here: pond identity and sex) using Procrustes distances among specimens and is equivalent to distance-based ANOVA, allowing comparison with the analyses of the morphometric/meristic variables stated prior (Anderson, 2001). To check for allometric trends across the ponds, we performed a preliminary Procrustes ANOVA that included centroid size × pond identity interaction as a covariate. While we found a significant difference in centroid size across ponds (p < 0.01) and pond identity (p < 0.01), we also found a significant effect of centroid × pond identity interaction (p = 0.036), indicating that allometric relationships are not the same in all the ponds. Therefore, we elected to not to include size as a co-variant for the final analysis due to the nongeneral allometric trends across the ponds and underlying importance of the allometric variation in describing the observed phenotypic variation (Klingenberg, 2016).

| PST calculation
PST was calculated for each pond using the formula: where 2 GB is the variation between ponds, 2 GW is the variation within ponds, h 2 is trait heritability, and c is a unit-less scalar variable (Brommer, 2011). Variation in traits between and within each pond pair were calculated using redundancy analysis (RDA), which partitioned the within and among pond variation of each multivariate trait matrix (the response variable) with pond as the among group explanatory variable. We used heritability of 0.5 (i.e., 50% of phenotypic variation is genetically based and due to additive genetic variance) and a scalar value of 1 (i.e., 100% of variance among populations is due to additive genetic variance, Brommer, 2011) for further analyses. These assumed values are commonly used in other QST and PST studies (Brommer, 2011;e.g., Leinonen et al., 2013). However, we examined the change in PST across all combinations of a range of heritabilities and scalar values (from 0.1 to 1 in 0.1 increments). This showed us that changing these variables did not greatly influence the results (Figure 3). Furthermore, her- h 2 = 0.58, (Leinonen et al., 2011) spine length: h 2 = 0.61 to 0.66 (Leinonen et al., 2011;Loehr et al., 2012)). Very high heritabilities are rare, but h 2 = 0.84 has been recorded for plate counts in some populations (Hagen, 1973).

| PST versus FST
We first calculated the mean PST and FST for each pond from the initial pairwise PST or FST calculations (see above) to avoid violating the assumption of independence in the subsequent linear models. These pond mean PSTs and FSTs were then compared, using the lm function in R to perform a linear regression, with FST as the response variable and PST as the explanatory variable. Given that our study utilizes a set of sampling sites that is large enough to compare means across all populations, we opted for a mean-based comparison of PST-FST and PST-EST. Assessing population means is a conservative approach compared to using analyses of pairwise distance matrices, such as Mantel test (Guillot & Rousset, 2013).
We tested the statistical difference between pond mean PSTs and FST using the ANOVA function in R, with pond mean values (N = 19) as the response and FST/PST group as the explanatory.
We removed two outlier points (final N = 17) for the defense PST versus FST significant test to maintain the assumption of normality for the ANOVA and to ensure the outliers were not influencing the test.

| Environmental variation
Total dissolved solids ranged from 0.042 to 0.306 ppm, pH from 7.26 to 9.38 (reflecting neutral to alkaline ponds), and pond surface area from 608 to 105,000 m 2 across the ponds. Proportions of sampled limnetic to benthic invertebrates were high across the ponds
Pectoral fin length, dorsal fin ray numbers, and caudal fin ray numbers (all p < 0.02, Table 1) differed between sexes, while pectoral fin ray number did not (p = 0.07).

| Geometric morphometric body shape
Procrustes ANOVA of the Procrustes coordinate matrix indicated threespine stickleback differed in body shape among ponds (p = 0.01, Table 1; Figure 4), but not between the sexes (p = 0.16).

| PST-FST and PST-EST comparisons
We tested the effect of changing values for h 2 and c (Figure 3). The results were quite robust, justifying our selection of h 2 = 0.5 and c = 1. Mean PSTs across ponds for feeding morphology, swimming TA B L E 1 Analyses of variance (ANOVA) for effects of pond identity on meristic phenotypic traits, grouped by trait type (body length, swimming, feeding, and defense) and Procrustes ANOVA of the effects of pond identity on the geometric morphometricderived Procrustes coordinate matrix of body shape morphology, and body shape were greater than pond FSTs only when h 2 was low (h 2 < 0.2-0.3; Figure 3). In contrast, defense traits showed consistently greater mean PSTs than FSTs across the range of h 2 and c (Figure 3). Using a h 2 = 1 (i.e., trait variation among populations is fully genetically determined) generally greatly altered the magnitude and variation of the PST calculations, whereas altering c at h 2 < 1 had little effect on any of the PST calculations (Figure 3).
Using h 2 = 0.5 in our calculation of PST, we found a significant positive relationship between defense PST and FST (R 2 = 0.404, p = 0.002), with defense PST significantly greater compared to FST (p = 0.02; Figure 5). We checked that the PST-FST linear relationship in defense traits was not driven by outliers, by removing the two extreme populations and reanalyzing the model, whereby the results were still significant (R 2 = 0.245, p = 0.025). We found nonsignificant PST-FST relationships for feeding (R 2 = −0.048, p = 0.687), swimming (R 2 = −0.058, p = 0.91), and shape (R 2 = −0.041, p = 0.596) traits (Table 2). Pond mean PSTs were significantly lower than FST for swimming traits (p < 0.01) and body shape (p = 0.02; Figure 5).
Feeding traits pond mean PSTs were not significantly different from FST (p = 0.14; Figure 5).
There was a positive relationship between PSTs of swimming morphology and ESTs represented as pond limnetic invertebrate communities (R 2 = 0.230, p = 0.022), and a positive relationship between PSTs and EST as limnetic prey (R 2 = 0.161, p = 0.050; (Table 3).
All other PST-EST relationships, including associations with abiotic variation, were nonsignificant (Table 3).
F I G U R E 4 Geometric morphometricbased Procrustes plots of the mean fish shape from each pond (M04 to P33) relative to the mean reference fish shape across all populations. Black dots correspond to landmarks

| D ISCUSS I ON
We found clear differences among phenotypes of threespine stickleback across 19 small, intermittently connected, ponds. These phenotypic differences were seen in meristic/morphometric feeding, defense, and swimming traits and in body shape (geometric morphometric). Phenotypic divergence (population pairwise PSTs) in defense traits was greater than neutral genetic differentiation (population pairwise FSTs), indicating divergent selection or phenotypic plasticity. There were no clear difference between phenotypic divergence and neutral genetic differentiation for swimming traits, feeding traits, or body shape, suggesting that divergence in these traits may be mostly influenced by genetic drift or stabilizing selection. We found little evidence for the role of environmental divergence (EST) in phenotypic divergence, which may partly reflect relative environmental similarities of the ponds across the study area. The exception was that divergence in swimming traits positively correlated with pond differences in the proportion of limnetic invertebrates in the environment and in stickleback diet. Overall, our results suggest that phenotypic divergence in these small populations is driven by a combination of neutral processes (gene flow, drift) and either natural selection or phenotypic plasticity in response to small-scale environmental variation.
Defensive traits showed stronger phenotypic divergence relative to neutral genetic divergence, regardless of assumed heritability, which suggests that divergent selection or phenotypic plasticity may have promoted differentiation in defensive traits across our study ponds. The among pond divergence in these traits was most apparent among the southern and westernmost ponds, which have previously been shown to have greater dispersal limitation compared to the more admixed northeastern ponds (Seymour et al., 2013 TA B L E 2 Linear regression statistics for effects of FST on PST, grouped by PST measure spine lengths) in threespine stickleback are commonly under divergent selection via predation pressure (Barrett, Rogers, & Schluter, 2008). The defensive traits of ancestral marine threespine stickleback likely evolved when gape-limited predators, including fish and birds, promoted the selection of long spines and numerous lateral plates (Bell & Foster, 1994). In freshwater populations, this selection has commonly been reversed, although there is considerable variation (Bell & Foster, 1994). In Belgjarskógur the predation of threespine stickleback is suspected to be primarily avian, for example, by the horned grebe (Podiceps auritus), red-breasted merganser (Mergus serrator), and great northern diver (Gavia immer), which migrate to the area for breeding. Fish predation is expected to be rare in these ponds due to their small size. Larger ponds may contain Arctic charr (Salvelinus alpinus) and brown trout (Salmo trutta; Einarsson et al., 2004) Here, we decided to include a sensitivity analysis of the PST calculation, which supported our selection of values for h 2 and c, but also highlights the importance of narrow sense heritability when calculating PST.
In threespine stickleback, highly heritable traits include lateral plate number, gill raker number, and dorsal fin rays, whereas body size, body shape, gill raker length, and gap width, as well as spine length, tend to be more plastic with narrow sense heritability ranging between 0.3 and 0.6 (Hermida, Fernandez, Amaro, & San Miguel, 2002). For this study, we assumed narrow sense heritability across the multivariate traits groups to be 0.5, based on existing literature and comparable PST/QST study assumptions. While assuming a mean narrow sense heritability of 0.5, which is comparable to heritability estimates in plastic length-based threespine stickleback traits (Leinonen et al., 2011), it is likely that the more plastic phenotypes (lower heritability) in our study (i.e., body shape and swimming structures), are highly influenced by drift or potentially stabilizing selection (i.e., PST < FST). Traits that are likely less plastic (higher heritability), for example, feeding structures, are more likely influenced by drift and could possibly be under some stabilizing selection if we assumed an elevated h 2 of 0.6 or greater, although elevated heritability values above 0.5 are not common for these traits (Leinonen et al., 2011;Loehr et al., 2012).

| Drivers of phenotypic divergence: environment versus neutral processes
We found that variation in swimming trait of stickleback was positively correlated with the relative abundance of limnetic prey (e.g., copepods and cladocerans) in the diet and the availability of limnetic prey in the ponds. Threespine stickleback morphology is often linked to availability of limnetic versus benthic prey (Bell & Foster, 1994), whereby stickleback develop longer gill rakers, larger fins (relative to body size), and more streamlined bodies to capture limnetic prey  (Willacker, Hippel, Wilton, & Walton, 2010). While previous studies have shown clear benthic-limnetic divergence in threespine stickleback in several lakes across the northern hemisphere, the differentiation is much weaker within the shallow Lake Mývatn, where divergence of stickleback is related to benthic habitats (Millet, Kristjánsson, Einarsson, & Räsänen, 2013). Likewise, Belgjarskógur threespine stickleback swimming morphology may be a plastic response to more limnetic feeding strategies in order to chase limnetic prey that are more abundant in the water column (Day & McPhail, 1996)-as reflected in the similar relationship between swimming PST and prey availability and diet ESTs. Based on the PST-FST and PST-EST comparisons, swimming traits may be under stabilizing selection due combinations of small effective population size and seasonal changes in prey availability. Whereas, body shape and feeding traits seem to reflect drift (PST = FST) across small geographic and environmental scales (Whitney, Bowen, & Karl, 2018). Given the limited number of generations since Belgjarskógur separated from the larger Mývatn system (~2,500 threespine stickleback generations), the observed phenotypic variation between ponds may be a recent example of genetic drift resulting from genetic sorting and range expansion following localized extinction (Hallatschek et al., 2007).

| CON CLUS IONS
Here, we examined the phenotypic divergence of threespine stick-

ACK N OWLED G M ENTS
We would like to thank the Mývatn Research Station and Árni Einarsson for providing a laboratory space, living quarters, and field-related assistance; The Genetic Diversity Center (GDC), Zurich, Switzerland, for genetic processing; Antoine Millet and Jeremiah Psiropoulos for field assistance; Jón S. Ólafsson and Daniel P. Govoni for invertebrate identification assistance. This study was funded by an Icelandic research council (Rannís) research grant #090225022.

CO N FLI C T O F I NTE R E S T S
Authors declare no competing interest.

AUTH O R CO NTR I B UTI O N S
BKK, KR, and MS designed the study and wrote the paper. MS conducted the fieldwork, performed the laboratory and data analyses, and lead the writing. All authors commented on manuscript drafts.

DATA ACCE SS I B I LIT Y
Data for the manuscript are available on dryad https ://doi. org/10.5061/dryad.5vf368h.