Parallelism in eco-morphology and gene expression despite variable evolutionary and genomic backgrounds in a Holarctic fish

Understanding the extent to which ecological divergence is repeatable is essential for predicting responses of biodiversity to environmental change. Here we test the predictability of evolution, from genotype to phenotype, by studying parallel evolution in a salmonid fish, Arctic charr (Salvelinus alpinus), across eleven replicate sympatric ecotype pairs (benthivorous-planktivorous and planktivorous-piscivorous) and two evolutionary lineages. We found considerable variability in eco-morphological divergence, with several traits related to foraging (eye diameter, pectoral fin length) being highly parallel even across lineages. This suggests repeated and predictable adaptation to environment. Consistent with ancestral genetic variation, hundreds of loci were associated with ecotype divergence within lineages of which eight were shared across lineages. This shared genetic variation was maintained despite variation in evolutionary histories, ranging from postglacial divergence in sympatry (ca. 10-15kya) to pre-glacial divergence (ca. 20-40kya) with postglacial secondary contact. Transcriptome-wide gene expression (44,102 genes) was highly parallel across replicates, involved biological processes characteristic of ecotype morphology and physiology, and revealed parallelism at the level of regulatory networks. This expression divergence was not only plastic but in part genetically controlled by parallel cis-eQTL. Lastly, we found that the magnitude of phenotypic divergence was largely correlated with the genetic differentiation and gene expression divergence. In contrast, the direction of phenotypic change was mostly determined by the interplay of adaptive genetic variation, gene expression, and ecosystem size. Ecosystem size further explained variation in putatively adaptive, ecotype-associated genomic patterns within and across lineages, highlighting the role of environmental variation and stochasticity in parallel evolution. Together, our findings demonstrate the parallel evolution of eco-morphology and gene expression within and across evolutionary lineages, which is controlled by the interplay of environmental stochasticity and evolutionary contingencies, largely overcoming variable evolutionary histories and genomic backgrounds.


Introduction
The degree to which the pathways of evolution are predictable, particularly under complex natural conditions, remains one of the greatest questions in evolutionary biology [1,2]. Numerous species in nature have repeatedly evolved similar phenotypes in response to similar environmental challenges, strongly suggesting repeatability and predictability of evolutionary trajectories [3][4][5][6][7] and highlighting the pervasive role of natural selection in evolution [8,9]. Despite this, extensive variation in the magnitude and direction of evolutionary trajectories has been observed in some classic examples of 'parallel evolution' [7,[10][11][12]. Stochastic factors such as differences in the local environment, gene flow, and selection regimes, or contingencies such as genomic background, demographic history, and the genetic architecture of adaptive traits, can lead to departures from phenotypic parallelism and non-parallelism at the genomic level [6,7,13]. Despite a growing number of studies into the parallelism or convergence of evolutionary trajectories, our ability to predict evolutionary outcomes is still limited [14]. To improve predictability, it is critical to understand the evolutionary routes leading to replicated ecological divergence in a range of independent systems [15] and to disentangle the impact of various contingent and stochastic factors on parallel evolution.
Evolutionary predictability is difficult to quantify in a biologically meaningful way when selection is multifarious and traits are quantitative [10,14]. Multiple statistical frameworks have been developed recently to address parallel evolution in natural systems, as a proxy for predictability of evolution, consistency of selection gradients, and pervasiveness of deterministic processes [16,17]. Parallel evolution of phenotypes can be inferred from specific traits, overall morphology, and/or ecological niche [7,12,18,19] and can be quantified as replicated evolutionary trajectories [16] or as the amount of variation explained by ecotype [17]. Parallel evolution at the genetic level is evidenced by shared genes and genetic regions underpinning parallel phenotypes across replicates, either due to repeated de novo mutation, recruitment of shared standing genetic variation, or introgression [3,13]. Furthermore, parallel evolution can potentially occur through convergent functional genomic changes despite non-parallel genetic changes, e.g. through differential expression of the same genes or genes in the same pathway [3,[20][21][22][23]. Here, we describe parallel evolution as the replicated, independent evolution of quantitatively similar adaptive phenotypes (ecotypes) by similar genomic underpinnings [3]. The study of replicated natural systems at different organismal levels is critical to identifying the genetic, environmental, and selective components underlying parallel evolution and its deviations but to date has been underexplored [11,24].
The replicated ecological and morphological post-glacial diversification of fishes into distinct trophic specialists in freshwater lakes is a powerful natural experiment for testing evolutionary predictability [25][26][27]. Divergence in such fishes typically occurs due to influences of disruptive selection and often involves the independent evolution of similar ecotypes [10,[28][29][30][31][32]. In the Holarctic salmonid species Arctic charr (Salvelinus alpinus), sympatric ecotype pairs that differ in several heritable phenotypic traits, such as body shape, body size, trophically relevant morphology, and life history, are abundantly replicated [26,27,[33][34][35]. Ecotypes distinguished by diet and foraging tactics include: Planktivores and Piscivores which feed in the pelagic, and Benthivores which feed in the benthic-profundal or benthic-littoral zone [26,27,33,34]. These ecotypes can be found across the distribution of Arctic charr [26,27] and most abundantly in the Atlantic and Siberian lineages, which likely diverged before the last glacial maximum during the Pleistocene [36]. Arctic charr ecotypes likely evolved following the last glacial maximum (around 10,000-20,000 years ago), after the colonization of newly formed postglacial lakes by putatively pelagic charr from different glacial refugia populations. However, the phenotypic parallelism, evolutionary histories, and adaptive genomic responses of ecotypes across lakes and lineages has never been investigated.
Here we tested the extent of parallelism in phenotype, evolutionary history, genomic patterns, and gene expression in repeated divergences of Arctic charr in their environmental context from eleven lakes within and across two distinct evolutionary and geographic lineages (Atlantic and Siberian) (Fig 1). To do so we contextualised the contemporary population genomic variation by resolving the historical evolution and demography. We propose that parallel evolution of ecotypes will be evident in significant similarity of phenotypes and significant sharing of putatively adaptively relevant genomic regions [3,10,[13][14][15]18]. Finally, to explicitly test the power to predict evolution in this system, we investigated how variation at different intrinsic organisational levels, such as genomic variation or gene expression, and extrinsic factors such as ecosystem size, correlate with phenotypic parallelism. Overall, we show that parallel adaptive phenotypes evolved repeatedly and independently within and across lineages despite differing evolutionary histories. Our findings suggest that the extent of parallel evolution is shaped by the interplay of genomic, transcriptomic and environmental variation.

Phenotypic divergence and parallelism
To test for parallelism in ecologically relevant phenotypes, we assessed morphology based on seven linear traits measurements of Arctic charr from the Atlantic lineage (Scotland) and Siberian lineage (Transbaikalia) (Fig 1F; total N = 1,329 individuals). This included eleven replicate ecotype pairs: six benthivorous-planktivorous and five piscivorous-planktivorous combinations (Fig 1A-1E; S1 Table). Additionally, three populations were included as outgroup comparisons in morphological analyses (piscivorous ecotype from Kudushkit, planktivorous ecotypes from Maloe and Bol'shoe Leprindo) due to the absence of phenotypic information for sympatric sister ecotypes, and one population (Tokko) with a benthivorous and mainly insectivorous ecotype that has no population replicate. These populations were excluded from the statistical parallelism analysis but included in genetic analyses.
Principal component analyses separated sympatric ecotypes along PC1 (52.4%) and PC3 (10.1%), while the two evolutionary lineages separated along PC2 (20.8%) (Fig 2A, S1B Fig). This suggests strong phenotypic parallelism in ecotype divergence within and across evolutionary lineages. Phenotypic trajectory angles between replicated ecotype pairs and between non-replicated ecotype pairs, with mean angles highlighted by dashed lines. Angles between replicated ecotype pairs were significantly smaller compared to non-replicated ecotype pairs (Wilcoxon test: P < 0.001) (C) Effect sizes (partial η 2 ) of the ecotype and 'ecotype x lake' interaction terms for all seven linear traits (dark dots) and PC1 to PC4 (white dots). Traits above the dashed diagonal line show stronger parallel than non-parallel divergence across ecotype pairs. (D) Mean size-adjusted trait-values and mean fork length (in mm) are plotted for each benthivorous-planktivorous and piscivorous-planktivorous ecotype pair, with means for sympatric pair being connected by a line. These reaction norms are colour coded blue and red highlighting the decrease or increase of trait values, respectively, between benthivorous (bn) or piscivorous (pisc) and planktivorous (pl) ecotypes. The measured traits are illustrated next to each plot for the benthivorous-planktivorous pairs. See text, panel 1C and S1 Fig for details

PLOS GENETICS
To quantify the direction and magnitude (length and direction of trajectories) of the phenotypic divergences across replicate sympatric ecotypes, we used a phenotypic trajectory approach on all traits combined [7,37,38]. The length of these trajectories (L) describes the magnitude of divergence and the angle (θ) between trajectories describes their direction through multi-trait space (see [37] for details; Fig 1F). Thus, the difference in phenotypic trajectory length (ΔL P ) and the direction of phenotypic trajectories (θ P ) define the extent of multivariate phenotypic parallelism; completely parallel ecotype pairs are those diverged to the same extent (ΔL P not different from zero) and in the same direction (θ P angle not different from zero) [12,37].
Second, to deconstruct the extent of parallelism in specific traits related to trophic morphology, habitat use, and swimming ability (Fig 1F), we used a trait-by-trait linear modelling approach across all ecotype pairs. We found that the parallel divergence elements of the model (the ecotype term) explained more phenotypic variance (partial-eta-squared: η 2 ) than the nonparallel elements of divergence ('ecotype x lake' and 'ecotype x lineage' interaction terms) for all traits, ranging from η 2 eco = 0.07 for HL to η 2 eco = 0.57 for HDO ( Fig 2C,. S1A Fig). However, as indicated by the significant non-parallel interaction terms, trait differences between ecotypes varied across populations, ranging from highly parallel to antiparallel in some populations and traits (Fig 2D, S3 Table). In combination, these results suggest that although the absolute trait values differ in each population or lineage the divergence between ecotypes is largely predictable across lakes and lineages.
Three traits in particular were significantly parallel: eye diameter, pectoral fin length, and head depth. For eye diameter (ED; η 2 eco = 0.31) and pectoral fin length (PFL; η 2 eco = 0.42), ecotype explained the highest proportion of phenotypic variation (Fig 2C and 2D, S1 Fig, S3 Table). This indicates that across lakes and also across evolutionary lineages, ecotypes are consistently diverged in those traits, which are closely associated with habitat use (ED) and swimming performance (PFL) [39]. Furthermore, head depth at the operculum (HDO) had a larger amount of variation explained by ecotype (η 2 eco = 0.57) compared to the lake effect (η 2 lake = 0.09), suggesting that this trait is under strong natural selection. The consistent variation in head depth across lineages (η 2 lineage = 0.92) could potentially be explained by the presence of deep-headed piscivorous ecotypes in the Siberian lineages.
Overall, these results indicate that ecologically replicated Arctic charr ecotypes show phenotypic parallelism, in both direction and magnitude of divergence, although at variable levels between populations and morphological traits.

Contemporary population genetic structure
To test if replicated ecotypes most likely evolved in parallel independently, we analysed population genetic co-ancestry, introgression, and genetic differentiation within and across evolutionary lineages. Based on a global SNP dataset of all individuals (N = 12,215 SNPs from ddRADseq; N = 630 individuals), we found that principal components clearly distinguished the two evolutionary lineages, clustered individuals by lake of origin, and further clustered lakes by broader river catchment [40,41] (Fig 3A; S2 Fig). This structure is also supported by haplotype-based genetic co-ancestry analyses (Fig 3B and 3C, S3 Fig, S1 Table). Histories of independent colonization were generally also supported by near complete mitochondrial haplotype sharing of individuals within lakes, and in some cases across nearby lakes (S4 Fig) [42].
However, the hierarchical population genetic clustering deviated on some occasions. We found elevated co-ancestry between geographically distinct populations (e.g. Kamkanda and Tokko in Fig 3C). In two cases, sympatric ecotype pairs formed polyphyletic genetic clusters ( Fig 3D; S5 Fig), with one ecotype being genetically more similar to the ecotype from a neighbouring lake than to its sympatric pair (piscivorous ecotypes in Kiryalta-3 and Kiryalta-4, planktivores from Dughaill and Uaine; Fig 3B-3E; S2-S5 Figs). These instances suggest introgression and non-independent divergences within and across lakes.
Therefore, we used f-statistics and D-statistics to test the role of introgression in repeated ecotype divergence in more detail (see S1 Text for detailed explanation; S4 Table and S5 Table; S5 and S6 Figs). These analyses showed that although signals of introgression were detected across many populations (S6 Fig), introgression was mostly not specific to replicated ecotypes from different lakes but rather detected across different ecotypes within and across lakes.
Admixture analyses further suggested that sympatric ecotype pairs differed widely in their degree of genetic admixture (Fig 3D-3F). The distribution of genome-wide genetic differentiation was also highly variable across populations (mean Fst ranging from 0.011 to 0.329, S6 Table), with differentiation between sympatric ecotypes in some cases being higher than between allopatric comparisons (S7 Fig). In two populations in the Atlantic lineage, Awe and na Sealga, no significant genome-wide genetic differentiation could be detected between sympatric ecotypes (Fig 3E). Overall these results suggest that most ecotype pairs have diverged independently and that the contemporary population genetic structure is highly variable across replicates.

Evolutionary histories of ecotype divergence
To test the association of evolutionary history and demographic parameters with ecotype parallel evolution [6,7,13], we used coalescence simulations implemented in fastsimcoal2 [43] to model the history and demography for each sympatric ecotype pair (two-population and three-population models, S8 Fig). We found that 11 out of 13 ecotype pairs likely evolved following postglacial secondary contact and admixture of ancestral populations that had diverged prior to the last glacial maximum (LGM) (Fig 4; S9 Fig, S7 Table). Variations of the secondary contact (SC) model had the best fit in most Siberian populations. Isolation-with-migration and introgression (IMint) models had the strongest support in two Atlantic lineage populations, Dughaill and Tay.   [44]. It should be noted that competing models could not be excluded in all populations, such as in Kamkanda (S7 Table), so higher density genomic data will be needed to better resolve these complex evolutionary histories.
The demographic parameters inferred from the most likely model for each population varied considerably in the initial divergence times between ancestral populations (910-16,690 generations ago) and the timing of secondary contact (56-3,

Shared and unique signatures of genetic differentiation between sympatric ecotypes
To determine if parallel ecotypes evolved with similar genomic bases, we examined the extent of Fst outlier sharing, jointly examined parallel signals of selection across replicated ecotype pairs, and identified putatively adaptive loci within and across lineages.
We found that ecotype pairs did not share more outlier loci (top 5%-Fst outlier loci; S10A-S10C Fig, S6 Table), or contigs containing outlier loci, than expected by chance in pairwise comparisons ( Fig 5A). Sharing was higher when outlier SNPs were inferred based on a permutated null distribution rather than the empirical Fst distribution (S11 Fig). This is a more liberal test, but the outcome suggests that outlier SNPs in one ecotype pair often show increased differentiation in another pair, potentially through hitchhiking with a shared causal SNP [45]. However the genome-wide density of ddRADseq-based SNPs is 2.9 ± 2.7 SNPs/Mb (mean ± s. d.) across populations and the decay of linkage disequilibrium (LD) between Fst outlier SNPs and non-outlier SNPs to background levels is less than <500 kb in most populations (S12 Fig). This decay is less than the average distance between Fst outlier SNPs and neighbouring SNPs (678,248 ± 165,075 kb) but larger than the average contig length (89kb ± 361kb), so these Fst outlier approaches likely underestimate genomic parallelism.
To maximise our power to detect shared outlier loci with this population genomic dataset, we used two analytical approaches that jointly test for convergent signals of divergent selection and shared patterns of allele frequency differentiation. First, using a hierarchical implementation of bayescan, we detected signatures of parallel selection across the benthivorous-planktivorous ecotype pairs in the Atlantic lineage (33 SNPs, FDR < 0.1) and piscivorousplanktivorous populations in the Siberian lineage (26 SNPs, FDR < 0.1), but not in benthivorous-planktivorous from the Siberian lineage ( Fig 5B, S10D Fig). None of the SNPs showed signs of parallel divergent selection in both lineages.
Second, using redundancy analyses, we identified 217 and 303 SNPs showing ecotype-associated allele frequency differences (z-score > 2) across lakes within the Atlantic and Siberian lineages, respectively ( Fig 5C; S13 Fig). These ecotype-associated SNPs explained 2.9% (benthivorous-planktivorous in Atlantic lineage) and 4.2% (benthivorous-planktivorous-piscivorous in the Siberian lineage) of the overall genetic variation between ecotypes within each lineage. Of these, eight SNPs from independent genomic regions on seven different chromosomes (S8 Table) were shared across lineages, which is more than expected by chance (χ 2 -square; P<0.001) and suggests some genomic parallelism across lineages. In the Siberian lineage, two SNPs were detected to be under both parallel divergent selection (bayescan) and also associated with ecotype divergence (RDA).

Parallel divergence in gene expression
To test if regulatory variation would show functional parallelism across ecotype pairs due to integrated effects of plasticity and genetically-mediated expression, we analysed genome-wide gene expression in white muscle between ecotypes from a subset of five lakes (N = 44 individuals, 30,849 genes; S1 Table). Across these, we compared differential expression, co-expression modules, and biological pathways.
Similar to population genetic patterns, we found a continuum of divergence in gene expression across ecotype pairs (Fig 6A; S14A Fig). Contrary to the genomic analysis, gene expression patterns were highly similar across lakes and ecotype pairs, with ecotype explaining most of the expression variation along PC1 (Fig 6A; η 2 Eco,PC1 = 0.80, P < 0.001) and more than the non-parallel interaction terms (non-significant except for PC4) for PC2 to PC4 (S14B Fig).
Differentially expressed genes (DEGs) were shared between replicated ecotype pairs significantly more often than expected by chance (Fig 6B), indicating highly parallel divergences in the expression of specific genes (S1 Text, S14C and S14D Fig).
Using a redundancy analysis, we identified 2,921 genes that showed ecotype-associated expression patterns (z-score > 2), and which explained 2.04% of the variation in gene expression (P = 0.008), after correcting for the effect of lake and lineage ( Fig 6C). These genes were involved in a range of biological processes, including cell cycle regulation ( Table), which are processes functionally associated with growth, cell differentiation and gene regulation. Furthermore, 162 of these ecotype-associated, differentially expressed genes were located in 13 identified co-expression modules (Total number of genes in modules: N = 806 genes) that were correlated with ecotype divergence for benthivorous-planktivorous ecotype pairs across lakes and lineages (S14E Fig, S10 Table). This further strengthens the importance of expression changes in gene networks and suggests parallel changes in regulatory networks across lakes and lineages.
Although it is known that plasticity plays an important role in the divergence of Arctic charr [46], we found that ecotype-associated expression in white muscle of wild-caught individuals was in part genetically determined. Using a linear modelling approach, we identified a total of 475 cis-eQTL (FDR < 0.1), and found that the expression of 25 ecotype-associated genes (0.85%) was significantly associated with cis-regulatory variation (cis-eQTL; FDR < 0.1; 9 genes at FDR < 0.05; S11 Table), suggesting a parallel regulatory basis. The most significant ecotype-associated cis-eQTL included the COMTD1-like gene, an enzyme associated with muscle mass in humans [47], TOMM5, a crucial protein involved in mitochondrial protein import, and NR4A1 which is involved in gene expression regulation (Fig 6D-6F; S11 Table).

Predictability of phenotypic and molecular parallelism
To explore the predictability of evolution by the direction and magnitude of phenotypic parallelism, we examined correlations with intrinsic and extrinsic context, such as genomic differentiation, gene expression and ecosystem size.
The magnitude of phenotypic divergence between sympatric ecotypes was positively correlated with genetic differentiation (Fst neut~LP : R 2 = 0.49, P = 0.009; Fig 7A). To remove the effect of population-specific selection and outliers, this Fst neut is based on the 7,179 non-outlier, putatively neutral, loci only (excluding loci with Fst > 95 th percentile from the full dataset of 12,215, see Shared and unique signals of selection). Post hoc exploration of this correlation suggests that genetic differentiation could be partially explained by evolutionary history, because Fst neut was higher in ecotype pairs that diverged under a secondary contact ('pre-LGM +SC') scenario (S15A Fig; Fst neut = 0.186, P = 0.034). However, it is difficult to make more detailed inferences because of the imbalance in the number of populations diverged under the two main scenarios (11 'pre-LGM+SC' vs. 2 'post-LGM').
In order to estimate the impact of genetic variation on the direction of phenotypic change, we compared the similarity of allele frequency and phenotypic trajectories across all population comparisons. While the direction of those 'neutral' allele frequency trajectories between sympatric ecotypes (N Total = 7,179; S15B Fig  In contrast to genomic changes, magnitude and direction of gene expression divergences between sympatric ecotypes were correlated with phenotypic divergence. Specifically, differences in the magnitude of gene expression divergence (ΔL GEx ) positively correlated with differences in the magnitude of phenotypic divergence (ΔL P~Δ L GEx : mantel r = 0.662, P = 0.02; Fig  7C; S15D Fig, S1 Text). The direction of expression divergence of ecotype-associated genes (θ canGEx ) tended to be positively correlated with the direction of phenotypic change (θ P ) (θ P~θcanGEx : mantel r = 0.567, P = 0.07; Fig 7D).
Finally, we investigated the explanatory power of ecosystem size, as a proxy for ecological opportunity [48], for the magnitude and direction of phenotypic, genetic and gene expression divergence. We found that the magnitude of phenotypic divergence was not correlated with ecosystem size, meaning that larger, and putatively more diverse lakes did not lead to stronger phenotypic divergence (ΔL P~d ifference in ecosystem size: mantel r = 0.017, P = 0.442). However, in agreement with earlier studies in Arctic charr [49] and Midas cichlids [48,50], we found that the phenotypic variance (mean trait variance~Ecosystem size: R 2 = 0.73, P = 0.001; S15D Fig) and genetic diversity (Ecosystem size~π: R 2 = 0.54, P<0.001; Fig 8A) scaled positively with ecosystem size, suggesting that populations in larger and deeper lakes were more

PLOS GENETICS
variable. The pattern was particularly influenced by highly genetically diverse populations inhabiting large lakes in Scotland, e.g. Awe and Tay.
We found that variation in ecosystem size was positively correlated with the direction of phenotypic change (θ P~d ifference in ecosystem size: mantel r = 0.30, P = 0.031; Fig 8B) and also the direction of genetic change, for neutral SNPs (θ Gn~d ifference in ecosystem size: mantel r = 0.36, P = 0.011) and ecotype-associated SNPs (θ RDA~d ifference in ecosystem size: mantel r = 0.52, P = 0.013; Fig 8C). Neither differences in the magnitude nor the direction of gene expression divergence were impacted by differences in ecosystem size (ΔL GEx~d ifference in ecosystem size: mantel r = 0.195, P = 0.2; θ GEx~d ifference in ecosystem size: mantel r = -0.21, P = 0.72), suggesting that variation in ecosystem size might not affect this gene expression divergence.

Discussion
With the aim of quantifying the extent of parallel evolution and make inferences about predictability, we integrated phenotypic, genotypic, and molecular data in environmental context for this Holarctic species, Arctic charr. We demonstrated parallel phenotypic changes in several eco-morphological traits and suggest putative drivers and constraints of parallel phenotypic divergence within and across evolutionary lineages. We showed that, while the interaction of genetic differentiation and gene expression divergence determined the magnitude and direction of parallelism, phenotypic divergence is mostly determined by the interaction of environmental variation, putatively adaptive genetic variation, and expression of ecotype-associated genes (Figs 7 and 8). Taken together, our results suggest that repeated selection, particularly on foraging-related traits, led to the parallel evolution of similar eco-morphologies in Arctic charr across geographic and evolutionary scales, and that this occurred in part via similar genetic and molecular pathways.

Phenotypic parallelism within and across evolutionary lineages
We detected substantial variation in the extent of parallelism for overall eco-morphology (based on trajectory analyses, Fig 2B) and for specific traits across independently replicated

PLOS GENETICS
Arctic charr ecotype pairs (Fig 2C and 2D). The extent of phenotypic parallelism for specific traits, as described by the variation explained by the ecotype term (η 2 eco ), ranged from weak parallelism in head length (η 2 eco < 0.1) to medium-strong parallelism in eye diameter (η 2 eco = 0.31-0.57) (Fig 2B and 2C), as classified by Oke et al [10]. Foraging and habitat related traits such as eye diameter and head depth had been suggested to be highly divergent [34,35,51], parallel [27,34], and partially heritable [33,39] in Arctic charr, further suggesting that these traits are under strong repeated natural selection and crucial for the adaptation to replicated trophic niches. Our finding is in agreement with several previous studies on parallel evolution that described a continuum of phenotypic divergence across populations (e.g. for fishes [10,12,38]).
However, most previous studies of phenotypic parallelism have focused on evolutionarily and/or geographically limited comparisons, such as lake-stream or benthic-pelagic sticklebacks from British Columbia [12,25], Trinidad guppies [10], or Midas cichlids in Nicaraguan crater lakes [7,50], potentially leading to generous estimates of parallelism because the effects of evolutionary contingency or environmental stochasticity will be likely minimised in those cases. In our study we detect substantial parallelism in specific traits but also in overall phenotypic divergence across these evolutionarily and geographically distinct lineages (Fig 2). For example, we found benthivorous-planktivorous ecotype pairs from Dughaill in Scotland and Kamkanda in Transbaikalia were highly parallel (θ P = 24.4˚; ΔL P = 0.0152). In fact, phenotypic trajectories for replicated Arctic charr ecotype pairs ( Fig 2B) were overall more parallel than those reported for global lake-stream sticklebacks [12,52]. Together, these results suggest that repeated natural selection can lead to phenotypic parallelism in both single phenotypic traits and overall eco-morphology, even across evolutionary and geographically distinct lineages that have been separated for more than 60,000 years. Yet, deviations from parallelism provide insights into the effects of environmental stochasticity and evolutionary contingency on repeated phenotypic divergence.

Potential drivers and constraints of phenotypic parallelism
Phenotypic parallelism between independently evolved populations has been thought to support a deterministic role of natural selection (reviewed in [10]) with deviations suggesting influences of stochasticity and contingency [67], though this has rarely been quantified. Here, we explored deviations from phenotypic parallelism to identify the potential drivers and constraints of parallel evolution in this broad context. We found that the magnitude of phenotypic divergence across ecotype pairs could be predicted by their neutral genetic differentiation ( Fig  7A). A similar pattern shown for lake-stream stickleback pairs on Vancouver Island [12] was speculated to be due to the reduction of adaptive phenotypic divergence in pairs with more gene flow. Using coalescent models we showed this historical effect on contemporary phenotypic and genotypic differentiation. Specifically, we found that Arctic charr ecotype pairs that likely diverged under gene flow (isolation with migration), rather than having a period of historical isolation, had less genetic differentiation and less phenotypic divergence (S15A Fig, S6  Table). In addition, ecotype pairs that showed lower gene expression divergence were also less diverged in overall eco-morphology (Fig 7C), though with these data we cannot determine how gene expression in this tissue contributes to phenotypic divergence. These results suggest that variation in genetically and environmentally mediated molecular responses facilitate stronger adaptive divergence.
In contrast to magnitude, the direction of phenotypic change was most strongly associated with environmental differences across populations, namely ecosystem size (Fig 8A). Ecosystem size has been suggested to be a good proxy for ecological opportunity in lakes, where larger ecosystems (by lake depth and surface area) harbour more ecological niches [48]. Ecosystem size has been shown to correlated with phenotypic variance in Arctic charr [49] and Midas cichlids [48,50] as well as the extent of planktivorous specialisation in Arctic charr from Greenland [53], supporting our findings that larger lakes support more diverse population (Fig 8). Variation in ecosystem size, and thus ecological opportunity and environmental variation, is potentially associated with context-dependent natural selection, therefore leading to different adaptive optima [68]. This is consistent with the correlation between ecosystem size and the direction of allele-frequency change in ecotype-associated SNPs (Fig 8C), which suggests that environmental differences partly determine adaptive genetic responses and thereby lead to parallel phenotypic responses (Fig 7B). Similar findings have been described in lakestream and marine-freshwater stickleback, with fine-scale environmental variation determining the direction of genetic and phenotypic change [12,54]. This suggests that ecosystem size is potentially a good estimator of ecological opportunity and environmental variation in postglacial lake systems, and that environmental context strongly impacts parallel evolution.
We found that the direction of phenotypic change was additionally correlated with variation in gene expression (Fig 7D), suggesting that similar molecular pathways translate parallel environmental pressures and genetic responses into parallel phenotypic outcomes. Ecotype pairs that were more similar in the expression of ecotype-associated genes, but not overall expression, tended to be more similar in their direction of phenotypic divergence. Patterns of gene expression variation were not correlated with differences in ecosystem size, suggesting that environmental variation related to ecosystem size, i.e. size of the littoral or pelagic zone, does not affect gene expression variation. We speculate that gene expression could compensate for variable genomic underpinnings and facilitating phenotypic parallelism. More research in this area is needed for natural populations.
Our findings suggest that the interplay of environmental variations, stochasticities, and molecular contingencies-for example as evidenced by differences in the direction of adaptive genetic responses-likely shape the extent of phenotypic parallelism in Arctic charr and potentially other postglacial fishes.

Molecular parallelism within and across evolutionary lineages
Contrary to the phenotypic and gene expression patterns, few regions of the genome were shared outliers between ecotype pairs across replicates ( Fig 5A) and genetic parallelism within and across evolutionary lineages was variable, as expected given the continuum of phenotypic parallelism and the relatively recent divergence of populations [13,55]. We used a range of approaches to infer genomic patterns. While the sharing of Fst outlier SNPs (i.e. the top 5% Fst distribution) was low and non-significant (Fig 5A), we found that many SNPs showing elevated differentiation were shared across ecotype pairs. This can be explained by the fact that our reduced-representation sequencing approach did not detect causal SNPs but instead linked SNPs showing elevated differentiation due to hitchhiking [45,56]. As the rate of LD decay differs around outlier loci and demographic processes will vary across ecotype pairs, the level of genetic differentiation in linked SNPs is expected to differ across populations. Consequently, not all SNPs linked to the same causal locus will be statistical outliers (i.e. above the 95 th percentile) but will tend to show elevated differentiation compared to the genomic background [45]. This suggests that genomic responses to selection in these replicate populations are at least partially parallel.
The significant but low number of SNPs that showed repeated ecotype-associated allele-frequency divergence in both evolutionary lineages suggests that inter-lineage genetic parallelism is relatively low. However the same SNPs or genomic regions are associated with repeated ecotype divergences in at least some populations even across lineages, evident by more permissive analytical approaches that did not require a SNP to be divergent between ecotypes in all pairs [57]. These identified significant signals of parallel selection across the genome (Fig 5B) and significant allele-frequency associations with ecotype across lakes within lineages (Fig 5C). The correlation we detected between ecotype-associated loci and the direction of phenotypic change within and across lineages (Fig 7B) suggests the adaptive role of shared genetic variation across lineages. Partial reuse of the same genomic regions across evolutionary lineages may suggest certain constraints on the genetic architecture of adaptive trait divergence and a limited amount of ecotype-associated standing genetic variation. High genetic redundancy in genetic architectures and large amounts of genetic variation would likely lead to lower genetic parallelism [58,59]. More high-density genomic data will be needed to determine the level of genomic parallelism in this system robustly.
Evidence from theory and empirical studies suggests that the extent of genomic parallelism is impacted by factors such as genetic co-ancestry and shared genetic variation [60], demographic history [13], and similarity in selection pressures [52]. Indeed, the higher parallelism of ecotype-associated SNPs within lineages compared to across lineages in these Arctic charr replicates likely reflects the role of genetic co-ancestry and shared adaptive genetic variation in genetic parallelism, in line with observations in other systems [60,61]. The influence of different pre-and postglacial demographic histories is also evident in Arctic charr. For example, whether the mode of speciation was inferred to be secondary contact with admixture or isolation-with-migration is reflected in the contemporary genetic differentiation and phenotypic divergence in Arctic charr; ecotype pairs that likely diverged from a common gene pool following secondary contact and admixture of refugial lineages, e.g. the Siberian populations, are more differentiated (S15A Fig) [34,40]. Finally, similar selection pressures found in similar environments are expected to lead to a component of parallel genetic responses, for example through the use of shared standing genetic variation even across lineages [13,30]. Here we suggest this is reflected in genetic parallelism of ecotype-associated loci correlated with ecosystem size (Fig 8B). Partial genomic parallelism has been detected in other postglacial fishes, such as sticklebacks and lake whitefish [55,62], suggesting that a mixture of parallel and non-parallel genomic responses may be common. Large-scale comparative or experimental studies across species will be needed though to tease apart the effects of environment and genetic background on genomic parallelism.
In contrast to the genomic patterns, we found gene expression divergence to be highly parallel between replicated ecotype pairs both within and across evolutionary lineages (Fig 6A and  6B) We speculate that differential gene expression might facilitate parallel phenotypic evolution despite variation in genetic parallelism in these replicate divergences of Arctic charr, as has been suggested in Littorina snails [63] and Australian groundsel [64]. Co-expression networks of ecotype-associated genes were associated with a range of biological processes such as metabolism, growth, cell differentiation and gene regulation, potentially explaining the observed differences between sympatric ecotypes in morphology and body size, as well as swimming and foraging behaviour [27,35,39]. These are prime candidates for future research on functional genomics of parallel evolution in freshwater fishes.
Like morphology, gene expression is affected by both environment and genotype. Therefore, the component of parallelism in gene expression could be influenced by similar shortterm plastic responses to these similar environments [46]. However, we found the expression of several ecotype-associated genes in benthivorous-planktivorous ecotypes pairs to be regulated by shared cis-eQTL, even across evolutionary lineages (Fig 6D-6F, S11 Table). This suggests that parallel gene expression divergence is in part controlled by a parallel genetic basis, which has also been shown for parallel intercontinental benthic-pelagic divergences in whitefish [65]. Due to the relatively low sample size we likely only detected the strongest cis-eQTLs and underestimated the true number of genes with genetically mediated differential expression. In-depth functional analyses and common garden experiments will be needed to tease apart the effects of environment and genotype on parallel gene expression regulation in Arctic charr and other salmonids.
Overall, we showed that putatively adaptive genetic and molecular parallelism exists within and even across lineages but is variable. We suggest this reflects contingencies of environment and history that influence contemporary phenotypic parallelism. Nonetheless replicated contexts of natural selection facilitating divergence is suggested by parallel genomic responses to ecosystem size. We propose that gene expression is a bridge that facilitates parallel evolution of ecotypes, potentially buffering environmental stochasticities and evolutionary contingencies such as variation in environment, genomic divergence, and evolutionary history.

Conclusion
The evolution of replicated ecotypes has long fascinated naturalists and evolutionary biologists, as it indicates the predictable action of natural selection [66]. Our study demonstrates components of phenotypic parallelism in these replicate divergences of Arctic charr ecotypes within and across evolutionary lineages. We identified components of the genome that are shared and associated with ecotype. We suggest that the extent of parallelism in phenotype and genotype is influenced by stochasticity and contingency, such as environmental variation, demography, and evolutionary history. Some of these influences can be quantified using integrated studies of parallel evolution, allowing better prediction of evolutionary trajectories and response to environmental change. These repeated divergences of Arctic charr provide an example of how replaying the tape of life can lead to repeated and predictable outcomes, contrary to Gould's predictions [67], but also illuminates the variable routes and mechanisms leading to parallel adaptations.

Ethics statement
Fish collection was undertaken under licence from Marine Scotland (CEA) and with local permissions (CEA, SSA).

Arctic charr sampling
Fish were sampled from nine Scottish lakes (Atlantic lineage) and nine Transbaikalian lakes (Siberian lineage) [36], between 1995 and 2015 using standard gill nets (Fig 1A). The sampled lake populations (we refer to all individuals within a lake as a population) contained different numbers and combinations of ecotypes (we refer to trophic specialists as ecotypes). We classified individuals into four ecotypes based on their primary diet (see [26,27,34,35,40] for details): 1) Planktivorous-which feeds mainly on plankton throughout the year, 2) Benthivorouswhich consumes a substantial proportion of benthic invertebrates, particularly during autumn and winter, 3) Piscivorous-which feeds mainly on other fish, 4) Insectivorous-which feeds largely on postlarval stages of insects and 5) unimodal-planktivorous-which represent nondiverged mainly plankton-feeding populations used as outgroups. After collection, we photographed the left side of each fish (Atlantic samples), or individual fish were preserved in formaldehyde for subsequent phenotypic analysis (Siberian samples). White muscle tissue (from underneath the dorsal fin and above the lateral line) and/or fin clips were taken for subsequent genetic and transcriptomic analysis and stored in absolute ethanol or RNAlater at -20˚C. Fish collection was undertaken under license from Marine Scotland and with local permissions.

Analysis of linear morphological traits
Eco-morphological analysis was performed based on seven linear traits, on 345 individuals from the Atlantic lineage and 984 individuals from the Siberian lineage (S1 Table). Seven linear measurements and fork length were taken from photographs using ImageJ v. 1.50i [68] for Atlantic samples or directly from formaldehyde fixed fish for Siberian samples (Fig 2B) [27]: FL-fork length, HDO-head depth at operculum, HDE-head depth at eye, HL-head length, ED-eye diameter, ML-maxilla length, LJL-lower jaw length, PFL-pectoral fin length. Linear traits were chosen based on previous studies on eco-morphological divergence in salmonid fishes and their potential functional importance [27,31,39,69]. Linear traits were correlated with body length, and therefore scaled to mean fork length, using the allometric formula as described in [70]: log 10 Y i = log 10 M i + b � (log 10 Lm-log 10 L i ); where Y i is the corrected trait value, M i is the measured trait value, b is the slope (regression coefficient) of the regression of the trait value against fork length (L i ) within each lake and ecotype, and L m is the mean fork length of all fish within a lineage. The slope was calculated using population and ecotype as covariates. Sizeadjusted measurements were used for all subsequent analyses. Principal component analyses (PCA) were used to uncover the major axes of phenotypic variation in the Atlantic and Siberian lineages, using the ppca approach in pcaMethods (R package) to account for missing data.

Analysis of phenotypic parallelism based on linear traits
To determine the contribution of parallel and non-parallel aspects to the overall morphological divergence of ecotypes, within and across populations, we used the ANOVA (multivariate analysis of variance) approach outlined by Langerhans and DeWitt [17]. ANOVAs (trait/PCẽ cotype + lake + lineage + ecotype x lake + ecotype x lineage) were performed for both lineages combined using individual principal component (PC) scores (PC1 to PC4) and individual linear traits to test for the extent of parallel (ecotype effect) and non-parallel (ecotype x lake interaction (E x L); ecotype x lineage (E x Lin) interaction) phenotypic divergence of sympatric ecotypes across lakes and lineages, and the effect of the unique evolutionary history of each ecotype pair (lake effect and lineage effect) on phenotypic variation across lakes and lineages. We used the EtaSq function in BaylorEdPsych (R package) to estimate the effect size (Wilk's partial η 2 ) of each model term for linear traits and principal component scores. Traits and PCs for which the ecotype term has the largest effect size are highly parallel between ecotypes across lakes and lineages. Those traits and PCs for which the ecotype term explains more variation than the interaction terms (which indicate non-parallel patterns of divergence in magnitude and/or direction), but not more than the lineage and lake terms, are to some degree parallel but are strongly influenced by differences in the evolutionary history between lake populations or lineages.
Furthermore, we performed a complementary phenotypic trajectory analysis (PTA) [16] to quantify the level of parallelism and deviations from parallelism based on all linear traits combined. The PTA was conducted using the trajectory.analysis function in geomorph (R package). The significance of differences in trajectory directions (θ P : differences in the direction of phenotypic change) and trajectory lengths (ΔL P : differences in the magnitude of phenotypic change) was assessed using 1,000 permutations. Ecotype pairs were considered parallel if the angle between trajectories and the differences in trajectory lengths were not significantly different from zero (see [11,12] for details).

Chromosome assembly and annotation of Arctic charr draft genome
We created a draft chromosome-level assembly of the Arctic charr genome based on an Arctic charr linkage map [71] and synteny with the Atlantic salmon reference genome [72] based on Chromosomer analysis [73]. The assembly was created using All-Maps [74]. We created a draft annotation using GeMoMa 1.4.2 [75] and the quality of gene predictions was evaluated using BUSCO v.1.22 [76]. Arctic charr and zebrafish (Danio rerio) orthologues were identified using Orthofinder [77] (see S1 Text for details). The present analyses are based on an early assembly of the Arctic charr genome and an updated chromosome-scale assembly of the Arctic charr genome was recently published [78].

DNA extraction and ddRADseq
DNA was extracted from fin clips and muscle tissue using the NucleoSpin Tissue kit (Macherey-Nagel), following the manufacturers recommendations. DNA quality and quantity were assessed using agarose gel electrophoresis and the Qubit Fluorometer with the dsDNA BR Assay (Life Technologies). ddRADseq libraries were prepared using a modified version of the Recknagel et al. (2015) [79] ddRADseq protocol for Illumina sequencing platforms. Paired-end 75-bp sequencing was performed on the Illumina NextSeq500 platform at Glasgow Polyomics (University of Glasgow) at 3-4M read coverage per individual.

Amplification, sequencing and analysis of the mitochondrial ND1 gene
The mitochondrial ND1 gene was amplified for 107 individuals (between 2 and 11 individuals per population) using the primer-pair B1NDF/B1NDF and PCR conditions as described in Schenekar et al. (2014) [80]. The PCR product was cleaned, and Sanger-sequenced in both directions at DNA Sequencing and Services (MRC I PPU). Contigs were assembled from forward and reverse reads using Sequencher v.5.4 (http://www.genecodes.com/) after removing low quality reads and trimming read ends. Reads for all individuals were aligned using Muscle in MEGA v.7 [81] and trimmed to a common length. A TCS haplotype network was built in POPART [82].

Processing of ddRADseq data
Raw sequence data quality was assessed using FastQC v.0.11.3 (http://www.bioinformatics. babraham.ac.uk/projects/fastqc). The process_radtags pipeline in Stacks version 1.46 [83] was used for demultiplexing raw sequence data based on unique barcodes, quality filtering and read-trimming to 70bp of all libraries. Processed reads were aligned to the Arctic charr draft genome with bwa mem v.0.7.15 using a seed length of 25bp and the-M option. Reads with mapping quality <20 were removed using samtools v.1.6. We used Stacks v.1.46 and the ref_map.pl pipeline for building RAD-loci and SNP calling. The populations module was used to export genotype calls in VCF format for further filtering in vcftools v.0.1.15. We created three different datasets; a global dataset for the Atlantic and Siberian lineages combined, and separate datasets for each lineage. SNPs were retained when the following criteria were fulfilled: (i) present in at least 66% of all individuals within a population and 2/3 of all populations, (ii) global minor allele frequency (MAF) � 0.05 or � 0.01 for the global dataset, (iii) heterozygosity � 0.5 (iv) in Hardy-Weinberg equilibrium (P > 0.05) in at least 2/3 of all populations and (v) with a minimum coverage of 6x. Filtering and conversion of data into the different formats was performed using Stacks, vcftools, PLINK v.1.90 and PGDspider v.2.11.2. Datasets for creating site frequency spectra were filtered in a similar way, except no MAF cutoff was used in order to retain informative low frequency sites and a maximum of 10% missing data within each dataset (e.g. each ecotype pair) was allowed. For all analyses, only one SNP per locus was retained to reduce the effect of linkage.

Summary statistics and analysis of population structure
To assess population structuring across and within lineages, and within lakes, we applied multiple approaches using the global and lineage-specific SNP datasets. First, we used PCA in adegenet (R package) to assess major axes of genetic variation. Second, Admixture v.1.3 [84] was used with a ten-fold cross-validation to detect the most likely numbers of clusters and genetic ancestry proportions within lineages. Genodive v.2.0b27 [85] was used to estimate pairwise genetic differentiation (Weir-Cockerham Fst) between ecotypes within and across lakes using the global dataset, with 10,000 permutations. We calculated genome-wide nucleotide diversity for each ecotype for all populations based on all SNPs using vcftools. We assessed the relationship among populations within and across lineages using a neighbour-joining splits network using SplitsTree4 v.4.14.4 [86]. To assess genetic co-ancestry among individuals within and across lakes, we used haplotype-based population inference approach implemented in fineR-ADstructure v.0.1 [87], using the same filtering criteria described for the SNP dataset. Analyses were performed using default settings.

Introgression and differential admixture
We used Treemix v.1.13 [88] to explore and visualize secondary gene flow within each lineage. We built maximum-likelihood trees for non-admixed individuals (admixture threshold of 0.25 as inferred with Admixture) to reduce the effect of contemporary admixture. We fitted up to six and ten migration edges for the Atlantic and Siberian populations, respectively, and chose the most likely migration events based on the maximized variance explained, maximum-likelihood and the significance of each migration edge. To formally test for introgression in a fourpopulation tree ('deviation from tree-ness') we used f4-statistics implemented in the fourpop function of Treemix and D-statisitcs (Abba-Baba test) implemented in Dsuite [89]. We used maximum likelihood trees for each lineage for the Abba-Baba test rooted with a single population from the other lineage (Davatchan and Dughaill). We focused on the Tree-based estimate of the D-statistic but report the Dmin-based and BBAA-based estimates for completeness. Furthermore, to test for significant admixture within a population in a three-taxon comparison (Target taxon C; Reference taxon A, Reference taxon B) we used f3-statistics as implemented in the threepop function in Treemix.

Inference of evolutionary histories
To distinguish between alternative evolutionary scenarios leading to ecotype diversity within lakes, we used coalescence simulations implemented in fastsimcoal2 v.2.5.2.3 [43] and information contained in the multidimensional site frequency spectrum (SFS). 2-population and 3-population SFSs were created using @a@i v.1.6.3 [90] for each population and parapatric outgroups if appropriate. Populations were downsampled by around 30% to reduce the effect of missing data. The minor folded site frequency spectrum was used due to the lack of a trinucleotide substitution matrix for salmonids and sequencing data for outgroup species. To determine absolute values for divergence times and other inferred parameters, we corrected the number of monomorphic sites in the SFS [91,92]. In brief, since only one SNP per RAD tag is retained, the ratio of SNPs to invariant sites in the spectrum is skewed. Thus, one has to calculate the actual ratio of SNPs to invariant sites in the initial dataset and use that to adjust the number of monomorphic sites [92]. A mutation rate of 1x10 -8 was used as no known mutation rate for Arctic charr or salmonids was available [93].
For all sympatric ecotype pairs, seven pairwise demographic models describing different historical divergence scenarios were tested (S6 Fig): Strict isolation (SI), Ancient migration (AM), Isolation-with-migration (IM), Secondary contact (SC), Secondary Contact with introgression (AdmSC), Isolation-with-migration with a historical change in migration rate (IMchange) and an IM-model with a historical introgression event between sympatric ecotypes (IMint) were tested. In lakes with three sympatric ecotypes (Kamkanda and Kalarskii Davatchan), or strong admixture across lakes (Loch Dughaill and Loch Uaine), models describing different combinations of strict isolation, isolation-with-migration, secondary contact, introgression and hybrid speciation were tested for all three ecotypes/populations together. These models tested in general two different evolutionary histories. The isolationwith-migration models test a history of divergence under constant gene flow (with differing rates), whereas secondary contact models mainly test the occurrence of historical secondary contact between different distinct lineages or populations (e.g. distinct gene pools or glacial refugial populations) prior to ecotype divergence.
We ran a total of 30 iterations for each model and lake, and selected the most likely model based on the AIC [43]. Each run consisted of 40 rounds of parameter estimation with 100,000 coalescent simulations. We only used variants with a minimum count of 2 in the SFS to filter out low frequency variants. Point estimates of inferred parameters were taken from the most likely model and averaged over the top five runs.

Patterns of selection and differentiation
To determine outlier loci between sympatric ecotypes, we screened the genome for loci showing genetic differentiation (Weir-Cockerham Fst calculated using vcftools) above the 95 th quantile of the Fst distribution (outlier loci). We estimated the number and proportion of shared outlier SNPs and contigs containing outlier SNPs (but not the same SNPs across contigs) across benthivorous-planktivorous and piscivorous-planktivorous ecotype pairs. To determine if more outlier SNPs or contigs are shared among two ecotype pairs than expected by chance, we used resample (R package) to resample n number of SNPs (n = number of outlier SNPs for each ecotype pair) 10,000 times with replacement from the full dataset and determined the mean number of shared SNPs from that distribution. We calculated proportionbased p-values based on the number of observed and expected shared outlier SNPs/contigs and the total number of SNPs for each pairwise comparison using R function prop.test. We also plotted the standardised Fst (ZFst) against delta nucleotide diversity (Δπ = π ecotype2 − π ecotype1 ) between sympatric ecotypes, to determine if outlier loci show reduced genetic diversity in one or both ecotypes, potentially indicating selective sweeps.
Furthermore, to assess the impact of differences in evolutionary history and demography on our ability to detect Fst outlier loci, we implemented a custom permutation approach. To obtain a Fst null distribution under panmixia, we randomly assigned ecotype labels to individuals within lakes and then calculated Fst using vcftools. We repeated this analysis 1,000 times and calculated the null distribution of Fst values and the 95 th percentile for each ecotype pair. Subsequently, we identified outlier loci as those with empirical Fst values above the permutated 95 th percentile and repeated the outlier sharing analyses (above) for this 'permutated outlier SNP set'. We only repeated the analysis for individual SNPs and not shared contigs.
Additionally, we used a hierarchical implementation of bayescan to jointly test for signals of parallel selection across replicated ecotype pairs within lineages using default settings [57].

Genome-wide association analysis
To detect loci significantly associated with ecotype within each lineage we used a redundancy analysis (RDA), controlling for the effect of lake (condition) using vegan (R package). Ecotypes were coded numerically, with planktivorous as 0, benthivorous as 1 and piscivorous as 2. We excluded unimodal populations and ecotypes from Tokko from this analysis. SNPs were selected as significantly associated with ecotype if the z-transformed loading for RDA1 (and RDA2 in the Siberian population) was above 2 or below -2 (equivalent to a two-tailed pvalue < 0.05). To test if more ecotype-associated SNPs were shared across the lineages, we used the same resampling approach as for the outlier SNPs.
We imputed missing data in each SNP dataset using the LD-kNNi method implemented in Tassel5 [94], based on the 10 closest genotypes using the default settings. To test the imputation accuracy, we calculated Pearson's correlations between allele frequencies before and after imputation for the full dataset and the subset with the highest proportion of missing data.

RNAseq and processing
Total RNA was extracted from white muscle tissue from 44 individuals (N = 4 per ecotype per lake) from five lakes (Awe, Tay, Dug, Kam, Tok), representing all possible ecotypes (benthivorous, planktivorous, insectivorous, piscivorous), using PureLink RNA Mini kits (Life Technologies, Carlsbad, CA). Both sexes were sampled and used in roughly equal ratios. Extractions were carried out following the manufacturer's instructions, with the exception of an additional homogenization step using a FastPrep-24 (MP Biomedicals) prior to isolation. RNA quantity and quality were assessed using the Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA) with HS Assay kits and a 2200 Tapestation (Agilent, Santa Clara, CA), respectively. High quality RNA was achieved, with A260/280 ratios between 1.9 and 2.1 and RNA Integrity Numbers above 8.3.
RNA-seq libraries were prepared and sequenced at Glasgow Polyomics (University of Glasgow) for Awe, Tay, Dug and Kam and at BGI (Shenzhen, China) for Tok. Individual cDNA libraries were prepared for each individual using the TruSeq Stranded mRNA Sample Preparation kit (Illumina, San Diego, CA) in combination with a Poly-A selection step. Libraries were sequenced on an Illumina NextSeq 500 (Illumina, San Diego, CA) using 75-bp paired-end sequencing, at a depth of 25-30M reads per library. Raw reads were processed using Scythe v0.9944 BETA (https://github.com/vsbuffalo/scythe/) and Trimmomatic v0.36 [95]. Leading and trailing bases with a Phred quality score <20 were removed and a sliding window approach (4 bp window size) was used to trim reads at positions with Phred scores below 20. A minimum read length of 50 bp was allowed. We used FastQC v0.11.2 to assess read quality before and after processing. Processing removed~2% of reads, resulting in 1.81 billion cleaned reads. The resulting reads were aligned against the Arctic charr draft genome using STAR v2.5.2b [96], with default parameters. Raw reads were counted for each gene based on the longest isoform annotation using the HTSEQ-count python package [97] with the unstranded (-stranded = no), CDS-based (-type = CDS,-idattr = Parent) settings. Only genes with at least 20 read counts per lake were used for further downstream analyses.

Gene expression analysis
To identify the major axes of expression variation across lakes, we performed a principal components analysis using the svd PCA approach in pcaMethods (R package) based on rld-transformed gene expression data (transformed using the DEseq2 R package) [98]. The raw read count table was used for the differential gene expression analysis between sympatric ecotypes using DESeq2 on a per lake basis. Furthermore, to identify genes with ecotype-associated expression patterns, we performed a RDA on rld-transformed count data using the vegan R package controlling for lineage and lake (conditions). We again used z-transformed loadings above 2 or below -2 as the significance threshold.
To further examine the functional bases of trophic divergence Arctic charr, we used a Weighted Gene Co-Expression Network Analysis (WGCNA) to identify co-expressed gene modules [99]. Network analyses were only performed on benthivorous-planktivorous ecotype pairs (from Awe, Tay, Dug and Kam; N = 32, four per ecotype, per lake). To reduce stochastic background noise from lake-specific effects in our expression data, we used a linear mixed model in variancePartition (R package) to identify a subset of genes with expression variation attributed to ecotype (see SI for detail). All genes for which 'ecotype' explained more than 10% of the total expression variation across individuals were used for network construction. A single network was constructed for all 32 samples and 1,512 ecotype-associated genes, from the log 2 scaled count data (DESeq2: rlog), using WGCNA (R package), following the standard procedure. Network modules were defined using the dynamic treecut algorithm, with a minimum module size of 25 genes and a cut height of 0.992. The module eigengene distance threshold was set to 0.25 to merge similar modules. To determine the significance of module-trait relationships, Pearson's correlations were calculated between module eigengenes (the first principal component of the expression profile for a given module) and lake and ecotype. P-values were Benjamini-Hochberg corrected (FDR<0.05).

Cis-eQTL mapping
To determine if the expression of candidate genes is genetically determined we performed cis-eQTL mapping using all benthivorous-planktivorous ecotype pairs (N = 32). First, we called SNPs from reference-aligned RNAseq data using freebayes (https://github.com/ekg/freebayes), after marking duplicates using picard, with a coverage threshold of three. We only retained biallelic SNPs with a phred quality score above 30, a genotype quality above 20 and an alleledepth balance between 0.25 and 0.75. Furthermore, we filtered for Hardy-Weinberg disequilibrium (p-value threshold < 0.01), and only kept sites that were present in at least 90% of all individuals across populations. The filtering was performed using the vcffilter command implemented in vcflib and vcftools. Using these filtering steps, we retained 12,393 SNPs.
To associate gene expression with sequence polymorphisms and identify cis-eQTL, we used MatrixEQTL v.2.2 (R package) [100]. We used a linear model with lake and lineage as covariates, and a maximum distance of 1 Mbp between SNP and differentially expressed gene (cisacting polymorphism only). Cis-eQTL were identified with a false-discovery rate (FDR) below 0.1 after correcting for multiple testing. Due to the low sample size and low statistical power, we focused our interpretation only on cis-eQTL for ecotype-associated genes (identified using RDA based on gene expression). However, we also report results for an FDR of 0.05.

Characterisation of differentially expressed genes
To detect genetic pathways associated with ecotype-associated differentially expressed genes (identified using RDA) and co-expressed gene modules, we performed overrepresentation analyses using the WebGestalt tool [101]. We only used Arctic charr genes which had 1:1 or 2:1 orthologs in the zebrafish genome, with the following settings in Orthofinder: minimum number of genes for a category = 5, maximum number of genes for a category = 500, number of permutations = 1000, number of categories with leading-edge genes = 20, KEGG pathways, organism = Danio rerio.

Gene sharing between comparisons
To calculate the expected number of shared differentially expressed genes (DEGs) between comparisons we used a permutation-based approach, similar to the outlier comparison, with 10,000 permutations. We randomly sampled N genes (N = the number of DEGs in a comparison) 10,000 times from each dataset with replacement and calculated the expected number of shared DEGs as the mean number of shared resampled genes in each comparison. We calculated proportion-based p-values based on the number of observed and expected shared DEGs, and the total number of genes for each pairwise comparison using R function prop.test.

Trajectory and regression analysis
Similar to the phenotypic trajectory analysis, we performed trajectory analyses (TA) based on different genetic datasets and gene expression data. The TA was performed using the geomorph trajectory.analysis function based on PC scores derived for each dataset. For the genetic data, we calculated trajectory lengths and angles for all neutral SNPs (N = 7,179 SNPs, PC1-6; θ Gn , and ΔL Gn ) and ecotype-associated SNPs (N = 217 SNPs in the Atlantic lineage, PC1-4; N = 582 SNPs in the Siberian lineage, PC1-6;θ RDA , ΔL RDA ). Except for the ecotype-associated SNPs, the TA was performed for both lineages combined. We also performed TA based on PC1-6 for all expressed genes (θ GEx , ΔL GEx ) and for PC1-5 based on all genes associated with ecotype in the RDA (θ canGEx , ΔL canGEx ). In all cases, we selected all PCs that cumulatively explain more than 50% of variation.
To identify how the different factors (phenotype, genotype, evolutionary history and gene expression) are correlated, we performed linear regression analyses using the lm function (for independent datasets) and Mantel tests (for non-independent datasets; Pearson correlation) in vegan using the different input datasets. First, we compared how differences in the angles between trajectories and lengths of trajectories, calculated based on different datasets (linear traits (θ P , ΔL P ), neutral SNPs, ecotype-associated SNPs, overall gene expression and ecotypeassociated expression), are correlated using Mantel tests. Furthermore, we determined how absolute magnitudes of divergence (absolute length of trajectories for the same datasets (L), and neutral and outlier-based Fst) are correlated using linear regressions.
Furthermore, we estimated the correlation between ecosystem size and trajectory lengths and directions. We used the first PC from a PCA performed using prcomp based on maximum lake depth and surface area, estimated using Google Earth Pro, as a proxy for ecosystem size [49]. We then calculated the Euclidean distance between PC1 scores for each pairwise comparison as a proxy for environmental distance, and tested its correlation with differences in trajectory lengths (ΔL) and trajectory directions (θ) using Mantel tests for phenotypes, neutral SNPs, ecotype-associated SNPs, gene expression and ecotype-associated gene expression. We also tested the effect of ecosystem size (PC1) on population nucleotide diversity and divergence patterns using linear regression analyses.  Fig. Haplotype network for the mitochondrial ND1 gene (N = 107). The size of each circle corresponds to the number of individuals sharing a haplotype. When sympatric ecotypes share one or several haplotypes than the circles or pies are only coloured by lake of origin. However, when sympatric ecotypes have distinct haplotypes, then each circle or pie is coloured by ecotype.  Table. (TIFF) S10 Fig. Genetic differentiation and selection. (A) Genome scan plot with top 5%-quantile Fst outlier loci highlighted in red. Chromosomes are alternatingly highlighted by blue boxes, and unplaced scaffolds are placed on the right side. (B,C) Genome-wide patterns of differentiation between sympatric ecotypes in the (B) Atlantic and (C) Siberian lineage. The z-transformed Fst (ZFst) is plotted against the delta nucleotide-diversity (Δπ; benthivorousplanktivorous and piscivorous-planktivorous). If the Δπ deviates from zero, it shows genetic diversity at this locus is reduced in one of the ecotypes. Loci with ZFst values above 4 were inferred to be significantly differentiated (red dots) and loci with ZFst above 3 (blue dots) are also reported.  Table. The most likely demographic models for each lake population. Shown are the four best fitting models for each population/comparison and the respective ΔAIC between them. (DOCX) S8 Table. Genes containing ecotype-associated SNPs identified in the cRDA analysis in both lineages. (DOCX) S9 Table. Gene ontology overrepresentation results for ecotype-associated expressed genes (RDA). (DOCX) S10 Table. WGCNA networks. Gene co-expression modules in network generated from 1,512 ecotype associated genes. (DOCX) S11 Table. Cis-regulated ecotype-associated gene expression. Ecotype-associated expressed genes that are associated with cis-eQTL. (DOCX)