Convergence in form and function overcomes non-parallel evolutionary histories in Arctic charr

Understanding the extent to which evolution is predictable is fundamental to advancing research on biodiversity origins. We studied the evolution of ecotypes in Arctic charr in 18 freshwater lakes and two evolutionary lineages. We find significant phenotypic parallelism across replicates, with the degree of parallelism most pronounced in adaptive traits associated with foraging. Genome-wide analyses of the underlying population variation revealed a strong influence of historical demography and gene flow. Some divergent genomic regions were significantly shared across ecotype replicates, though there was a high proportion of population-specific variation. Functional molecular bases to ecotypes in gene expression and biological pathways were extensively and significantly shared across ecotype pairs. These results demonstrate a spectrum of predictability, from low parallelism at the nucleotide level to high parallelism in regulatory molecular bases and functional adaptive aspects of morphology, and advance our understanding of parallel evolution through divergent evolutionary routes.


Abstract:
Understanding the extent to which evolution is predictable is fundamental to advancing research on biodiversity origins. We studied the evolution of ecotypes in Arctic charr in 18 freshwater lakes and two evolutionary lineages. We find significant phenotypic parallelism across replicates, with the degree of parallelism most pronounced in adaptive traits associated with foraging. Genome-wide analyses of the underlying population variation revealed a strong influence of historical demography and gene flow. Some divergent genomic regions were significantly shared across ecotype replicates, though there was a high proportion of populationspecific variation. Functional molecular bases to ecotypes in gene expression and biological pathways were extensively and significantly shared across ecotype pairs. These results demonstrate a spectrum of predictability, from low parallelism at the nucleotide level to high parallelism in regulatory molecular bases and functional adaptive aspects of morphology, and advance our understanding of parallel evolution through divergent evolutionary routes.
traits, except maxillary length in piscivorous ecotypes, were significantly parallel across lakes though also strongly influenced by unique evolutionary histories of populations (h 2 Ecotype <h 2 Lake ) (table S4). Thus, despite a considerable level of non-parallelism in morphology within and across lineages, there is significant parallelism amongst components of fish morphology that have functional significance for foraging efficiency (head shape, eye diameter) (22) across replicates.
To determine if parallel ecotypes have evolved via similar population genetic processes and under contemporary environmental influences, we analyzed a set of up to 12,215 singlenucleotide polymorphisms (SNPs) in 648 individuals from 18 populations. The degree and pattern of genetic differentiation (Fst) differed extensively within and among lakes, forming a continuum ( fig. S9, table S8) that was not strictly associated with the geography of populations, and suggested important differences in evolutionary history and reproductive isolation among populations (22). For example, variation in genetic differentiation among ecotypes within lakes was positively correlated with the mean degree of phenotypic differentiation (fig. S10, table S9: Pearson's r outlierSNPs = 0.695, P = 0.038; Pearson's r neutralSNPs = 0.648, P = 0.062), consistent with isolation-by-adaptation (12,23). We predicted that some of this variability in genetic and phenotypic divergence may be due to local environmental differences (9), as has been found to be important in charr (24). Indeed, the level of genome-wide genetic diversity across lakes was mainly explained by ecosystem size ( To understand if replicate populations in these environments have evolved by parallel evolutionary routes, we assessed demographic and evolutionary genetic histories using genomewide SNPs and mitochondrial haplotypes. Population genetic ancestry and phylogenetic patterns ( Fig. 2; figs. S9, and S11 to S15) revealed strong variation in the degree of differentiation and admixture among ecotypes and populations. Most populations form genetically distinct clusters (figs. S12 to S14) but varying levels of shared ancestry across populations and ecotypes indicate histories of differential secondary gene flow or ancestral population structuring (Figs. 2D and 2E, figs. S13 and S14). In two cases populations formed polyphyletic genetic clusters, i.e. included both sympatric ecotypes and a nearby generalist lake population (Dug-Uai and Kir3-Kir4), which suggested strong secondary gene flow or shared ancestry of similar ecotypes ( Fig.  2A to 2C). Discordance in population structure between mitochondrial and nuclear DNA further supports a complex history of divergence and admixture, with near complete haplotype sharing across ecotypes ( fig. S16). Formal analysis of introgression and secondary gene flow using f3and f4-statistics found evidence for extensive secondary gene flow among most populations in both lineages ( Fig. 2B and C; tables S10 and S11), and evidence for introgressive hybridization in the planktivorous ecotype in Dughaill and the small-piscivorous ecotype in Kalarskii Davatchan ( Fig. 2B and 3C, table S11). Coalescence simulations implemented in fastsimcoal2 (25) suggested that most ecotype pairs likely evolved via secondary contact that admixed ancestral lineages. Considerable variation exists in inferred divergence times (910 -16,690 generations) and admixture proportions (0.003 -1.0) across lakes (figs. S17 and S18; table S12). Models suggest that during the last glacial maximum (LGM) ancestral populations in Siberia were strongly isolated (fig. S18) while in the Atlantic lineage weak gene flow was detected during the LGM (fig. S18), adding to the evolutionary contingency across lineages. In two Scottish populations with low contemporary genetic differentiation between ecotypes (Fst = 0.002 and 0.001; Fig. 2D, table S8), we found evidence for sympatric divergence of ecotypes after the LGM (isolation-with-migration model; T DIV, Awe = 1245 generations, T DIV, naS = 4795) (fig. S18). However, in most cases, models including recent introgression are better supported than those without (figs. S17 and S18; table S12), suggesting a pervasive effect of secondary contact and introgression in sympatric ecotypes, similar to other polymorphic postglacial fish species (19,26). Subsequent to complex histories, genetic and ecological data suggest most ecotypes likely diverged in sympatry (27), facilitated by genetic variation that arose in allopatry across ancestral populations. Thus, findings from multiple approaches suggest that despite replicated parallel evolution of phenotype within and across lineages, there is a high level of variation across populations and lineages in evolutionary genetic and demographic histories.
To test the extent to which evolution overcomes complex differing genomic histories resulting in parallelism, we compared genome-wide patterns of differentiation across ecotype pairs. We found one large genomic region of 4.2 Mb (Contig985 on LG1) that exhibited extreme differentiation (top 5% Fst outlier-SNPs) in five out of six benthivorous-planktivorous ecotype pairs and one piscivorous-planktivorous pair (fig. S19), strongly suggesting that this region plays a potentially important role in benthivorous-planktivorous divergence. However, in most cases replicate ecotype pairs and lineages shared fewer outlier genomic regions and loci under selection than expected by chance, when comparing particular outlier SNPs or their surrounding regions (contigs) ( Fig. 3A and 3B, figs. S19 and S22). Patterns of genome-wide differentiation and diversity were highly variable across populations, strongly influenced by the underlying evolutionary history (figs. S19 and S21, table S8); populations for which we had inferred histories of secondary contact had few outliers and symmetrical values of ZFst relative to ∆π, while populations with ecotypes that diverged recently under gene flow without ancestral secondary contact (by IM) showed reduced π in one compared to other ecotype in differentiated loci ( fig. S21), typical of response to selection. Thus, despite evidence for parallel divergence in a small number of genomic regions across two or more ecotypes, most genomic response to selection is unique and contingent on demographic history.
To identify parallelism in the functional genomic basis underlying these ecotypes, we compared genome-wide association results between lineages. Loci associated with ecotype were widely distributed across the genome and not shared across lineages ( fig. S22). Although candidate loci from association analyses were non-parallel across the evolutionary lineages, we found that the allele frequencies were to some extent parallel across replicate ecotypes within lineages ( fig. S23; table S13; Scotland: h 2 ecotype, PC4 = 0.70, h 2 ExL, PC4 = 0.30; Siberia: h 2 ecotype, PC1 = 0.943, h 2 ExL, PC1 = 0.508). Similar to the phenotypic analysis, GWAS candidate loci showed a generally higher level of parallelism within the Siberian lineage. These findings demonstrate that the genetic basis of ecotype differs across lineages in the set of loci we interrogated.
Given the non-parallelism in genomic patterns but parallelism in phenotype replicates, we hypothesized that regulatory variation would be parallel among ecotypes. We analyzed genomewide gene expression for 44 white-muscle transcriptomes between ecotypes from a subset of five lakes (table S14) and compared differential expression, co-expression modules, and biological pathways. Similar to population genetic patterns, we found a continuum of differentiation in gene expression across ecotype pairs ( fig. S24). Contrary to the genomic analysis, differentially expressed genes (DEGs) were shared between parallel ecotype pairs more often than expected by chance ( Fig. 3D; table S14) and modules of co-expressed genes ( fig. S25; table S16) were correlated with ecotype across populations and lineages. Analyses of biological pathways found three characterized genes that were consistently differentially expressed in five out of seven ecotype pairs: ABCC8, NDPK, and ALDOA (Fig. 3C, table S15 and S18). These genes are biologically relevant because of their association with growth rate, body size, and metabolism (28-30), phenotypes that differ significantly between ecotypes (16-20), including in our morphology analyses ( fig. S4). Other biologically relevant DEGs were shared between up to four pairs (19 genes, table S18), such as hemoglobins HBA-4 and HBB-1 ( fig. 26) which might facilitate different swimming performance of ecotypes, and RERG, a growth inhibitor (Fig. 3E). We found RERG expression is correlated to the mean body size of ecotypes across all our populations, an important trait that differentiates ecotypes in many populations ( Fig. 3E; fig. S4). DEGs were enriched for 15 functional biological pathways and some were shared between multiple populations (table S19), such as oxidative phosphorylation (Awe, Tay, Kam bn-pisc; dre00190, FDR < 0.1), ribosome biogenesis (Awe, Tok; dre03008, FDR < 0.1), and ECMreceptor interaction (Awe, Tok; dre04512, FDR < 0.1). An additional seven pathways (in four modules) were significantly enriched across co-expression modules (table S20), of which two were shared with DE gene sets: ECM-receptor interaction, dre04512 and glycerolipid metabolism, dre00561 (tables S17 and S19). Genes that we had found to be associated with ecotype divergence or to be under selection at the genome level, were also involved in some these pathways, such as the ECM-receptor interaction or ABC transporter pathways (tables S20 and S21) in the Siberian or Atlantic populations, suggesting that gene expression differences have at least partially an underlying genetic basis. Genes in highly differentiated regions (Contig 985), and regions under selection and associated with ecotype in both lineages (Contig 1512), were also involved in differentially and co-expressed pathways (e.g. fructose and mannose metabolism, glycloysis, phosphatidylinositol signaling system; Contig 985), or significantly enriched for a co-expressed pathway (ErbB signaling pathway (dre04012); Contig1512), that play important roles in growth, development and performance (see Supplementary text). These demonstrate that there is a significant level of parallelism in the functional genomic underpinnings of replicate ecotypes in charr.
Our findings demonstrate parallel phenotypic changes in several traits associated with foraging, morphology, and ecological divergence in Arctic charr over a large number of replicate but variable environments. This parallelism has evolved within and across two evolutionary lineages, spanning time scales from hundreds to tens of thousands of years. We also demonstrate a pervasive role of population genetic structure, genomic architecture, and demographic history that strongly influence non-parallelism at the genome level. We find that gene expression in an ecologically relevant tissue, white muscle, was significantly parallel across ecotypes and that candidate genes and functional pathways encompass many aspects of phenotype, such as growth and metabolism, giving a distinctive insight to the functional basis of replicate ecotypes of fishes in their natural environment and suggesting a mechanism by which eco-morphological parallelism is achieved. Thus, our findings highlight a predictability of evolution 'replaying the tape of life' (2), overcoming constraints of stochasticity and contingency of different evolutionary and genomic backgrounds, in one of the most expansive natural experiments in replicated evolution (9).  (table S6), although overall divergence is highly parallel (table S7). Wireframes show the phenotypic extremes along PC1, highlighting the differences between benthivorous (orange) and planktivorous (blue) ecotypes, particularly in eye and head size relative to body size. Ellipses show 95% confidence intervals by ecotype. (B) Genetic diversity is positively correlated to ecosystem size, independent on the number of ecotypes within a population [Bimodal (bi) with two ecotypes, trimodal (tri) with three ecotypes and unimodal (uni) with only one or no ecotypes]. Grey areas around the regression line show 95% confidence intervals. (C) Divergence in morphology based on linear discriminant analysis (LD1) of six linear morphological traits (N = 1529). (D) Trajectories for individual phenotypic traits in benthivorous-planktivorous and piscivorous-planktivorous ecotype pairs (red or blue in either direction), with exemplar images of ecotypes above.    Fig. 3. Quantification of parallel and non-parallel genetic and transcriptomic divergence across ecotype pairs. (A-B) The majority of divergent SNPs (above 95 th quantile of empirical Fst distribution) were unique to a single ecotype pair and not significantly shared more often than expected by chance (based on permutation test of observed versus expected number of shared outlier SNPs, P < 0.05). (C) Differential gene expression across all populations for three genes significantly shared. (D) Genes that are differentially expressed across ecotype pairs (extent of sharing weighted by color). Significant comparisons are highlighted with an asterisk. (E) Expression patterns of RERG, a growth inhibitor by ecotype that correlates with mean body size across ecotypes.

Materials and Methods
Study populations, ecotype classification, sample collection and processing Arctic charr were sampled from nine Scottish lakes (Atlantic lineage) and from nine Transbaikalian lakes (Siberian lineage) (31), between 1995 and 2015 using standard gill nets ( fig. S1). The sampled lake populations (we refer to all Arctic charr individuals within a lake as population) contained different numbers and combinations of trophic specialists (we refer trophic specialists as ecotypes), and detailed sample sizes for each analysis, locations and ecotype compositions are given in table S1.
We distinguished and classified ecotypes based on primary diet (trophic) (see (16,17,19,20,(32)(33)(34)(35) for detailed overviews on trophic and phenotypic divergence). Although proportions of main food sources (benthic, plankton, insects and fish) differ across lakes and some populations are more opportunistic than others, we distinguished four main ecotypes in this study, based on established classifications (i.e. (16) for an overview) and to allow comparability across lakes and lineages: the planktivorous ecotype that mainly feed on plankton throughout the year; the benthivorous ecotype that also consumes a substantial proportion of benthic diet, particularly during autumn and winter; the piscivorous ecotype that mainly feeds on other fish; and the insectivorous ecotype for which terrestrial insects make up a large proportion of their diet. Although many ecotypes supplement their diet with plankton during certain times of the year when it is abundant, for example during the spring/summer algae bloom, ecotypes strongly differentiate based on diet (trophic niche) during resource-limited periods of the year and this is how we have classified them here. Many of these ecotypes also differ in body size, development and growth rate (17,19,33,35,36).
We sampled a total of nine populations from the Atlantic lineage (table S1). Four of the nine Scottish populations (na Sealga, Tay, Awe and Dughaill) contained two ecotypes, benthivorous and planktivorous (19,20,32). We further sampled four populations (Eck, Lubnaig, Uaine, Merkland) containing only one non-divergent (generalist)-planktivorous populations (unimodal-planktivorous), (37), as outgroups for the phenotypic and genomic analysis. All Scottish populations except Loch na Sealga have been extensively studied ecotypes (19,20,32,37), but ecological and morphological data for the Loch na Sealga population are presented as part of this study (see below). Although Loch Rannoch harbours three different ecotypes (38), samples used in this study include representatives of at least two ecotypes and were used as genetic outgroup samples.
All Siberian populations used in this study were multimodal, harbouring different combinations of benthivorous, planktivorous, piscivorous, and insectivorous ecotypes (17,27,33-36,39). However, not all ecotypes were available for each population, either because they are extinct or very rare. For Bol'shoe Leprindo and Maloe Leprindo only the planktivorous ecotype was available and no phenotypic data were available for the small planktivorous ecotype from Kudushkit.
After collection a photograph of the left side of each fish was taken (Scottish samples) or individual fish were preserved in formaldehyde (Siberian samples) for subsequent phenotypic analysis. White muscle tissue (from underneath the dorsal fin and above the lateral line) and/or fin clips were taken for subsequent genetic and transcriptomic analysis and stored in absolute ethanol or RNAlater at -20ºC.
Analysis of linear morphological traits A total of 545 individuals from the Atlantic lineage (44 to 119 individuals per population, mean ± SD = 68.13 ± 26.24) and 984 individuals from the Siberian lineage (20 to 335 individuals per population, mean ± SD = 109.33 ± 105.89) were used for eco-morphological analysis based on linear traits (table S1). Seven linear morphological measurements and fork length were taken for each individual from photographs using ImageJ v. 1.50i (40) for the Atlantic populations or directly from formaldehyde fixated fish for Siberian samples ( fig. S2A; (17)): FL -fork length, HDO -head depth at operculum, HDEhead depth at eye, HL -head length, ED -eye diameter, ML -maxilla length, LJL -lower jaw length, PFL -pectoral fin length. All measurements, except HDO, were comparable between Siberian and Atlantic samples. Therefore, HDO was not used in combined analyses of both lineages. Linear traits were chosen based on previous studies on eco-morphological divergence in salmonid fishes and their potential functional importance (e.g. (17,35,41,42).
As these linear traits are correlated to body length, all measurements were scaled to mean fork length independently for Siberian and Atlantic populations using the allometric formula as described in (41): log 10 Y i = log 10 M i + b * (log 10 L m -log 10 where Y i is the corrected trait-value, M i is the measured trait value, b is the slope (regression coefficient) of the regression of the trait value against fork length (L i ) within each lake and ecotype, and L m is the mean fork length of all fish within a lineage. The slope was calculated using population and ecotype as covariates to take variation in mean fork length into account. For all subsequent statistical test, these sizeadjusted measurements were used.
Kruskal-Wallis tests with a post hoc Dunn-test were performed on corrected linear measures for each trait to test for significant divergence between sympatric ecotypes. Further, principal component analyses (PCA) were used to uncover the major axes of phenotypic variation in the Atlantic and Siberian lineage using prcomp in R. To determine the effectiveness of distinguishing ecotypes within and across populations based on all linear measurements (excluding HDO) combined and for analysing the degree of morphological differentiation among ecotypes within and across lakes, we performed a canonical variate analysis (CVA) using the Morpho R-package with 1,000 permutations for both lineages combined. Pvalues were adjusted for multiple testing using a Benjamini-Hochberg correction. We plotted the derived linear discriminant scores using ggplot2.

Analysis of phenotypic parallelism based on linear traits
In order to determine the contribution of parallel and non-parallel evolutionary effects on the overall morphological divergence of ecotypes within and across populations, we used the MANCOVA (multivariate analysis of co-variance) approach outlined in (43). This analysis was performed for each lineage separately and both lineages combined using all linear traits. Unimodal populations and the single benthivorous-insectivorous ecotype pair (Tokko) were excluded for the analyses of parallelism. MANCOVAs were used to determine the effect sizes of shared (parallel) divergence between ecotypes across populations (Ecotype effect), the unique evolutionary history of each population (Lake effect) and the differences in magnitude and direction of divergence (non-parallel effect) in each population (indicated by the Ecotype x Lake ('ExL') interaction effect). We estimated Wilk's partial h 2 using the Rpackage BaylorEdPsych to determine the explanatory ability of each variable compared to the unexplained total variance. We further used linear models to estimate effect sizes on a trait-by-trait basis for benthivorous -planktivorous and for planktivorous -piscivorous ecotype pairs, and estimated Wilk's partial h 2 using the BaylorEdPsych R-package.
Geometric morphometric analysis of body shape in bimodal Atlantic lineage populations In addition to the analysis of linear traits, we used geometric morphometric analysis to examine the degree of differentiation and parallelism in body shape. This analysis was done for the four bimodal Atlantic populations (Awe, Tay, na Sealga and Dughaill N = 347), as suitable high-quality photographs were not available for the other populations. For this subset of individuals, 23 homologous landmarks ( fig. S2B) on each image using tpsDIG2 (44) to investigate body shape. Landmarks were chosen based on previous studies and based on their previously established functional importance in foraging and locomotion (19-21,41,42). Shape differences due to bending of fish were removed using the unbend function in tpsutil (Rohlf 2004) based on landmark 1, 8, 22, 23 and 18. The resulting superimposed shape files were used for geometric morphometric analysis in geomorph (45). We performed a general Procrustes fit to remove variation due to size and orientation of individuals.
First, we analysed the full dataset with all four lakes together. We tested for homogeneity of allometric curves using the procD.allometry function. As allometric curves were not parallel we did not compare mean shapes but rather differences in the slope of allometric curves across ecotypes and lakes, using a linear model with the logarithmic centroid size as a covariate and the ecotype and lake interaction as grouping. The model was implemented using the advanced.procD.lm function in geomorph. We further performed a principal component analysis to explore the body shape variation within and across lakes and reveal the major axes of variation. Eigenvectors were plotted using ggplot2.
Similar to the analysis of linear traits we used individual ANOVAs for the first five principal component axes to assess the effect of shared parallel divergence (Ecotype), unique evolutionary history of each population (Lake effect), and the non-parallel aspects of divergence across lakes (Ecotype x Lake interaction). We estimated h 2 using the BaylorEdPsych R-package. Furthermore, we used a phenotypic trajectory analysis (PTA) (46) as a complementary analysis for determining the level of parallelism in body shape across the four Scottish populations. The PTA was performed based on Procrustes distances using the geomorph R-package.

Ecological analysis of Loch na Sealga ecotypes
We hypothesized the presence of distinct ecotypes, as individuals of comparable age within Loch na Sealga differ strongly in fork length ( fig. S4), which has been associated with ecological divergence in a range of other Arctic charr populations (e.g. (17,19)) and other salmonid species (e.g. Bernatchez 2010). Arctic charr were caught in Loch na Sealga during spawning time using benthic and pelagic gill nets. To determine the extent of ecological segregation among putative ecotypes in Loch na Sealga, we analysed parasite infection intensity and stable isotope signatures.
First, we analysed the infection intensity of individuals by Diphyllobothrium spp. larvae, a parasitic cestode following (20). The infection by this cestode is used as an indicator for a predominantly planktivorous diet, as this parasite is trophically transmitted through feeding on planktonic copepods, the first intermediate host for the Diphyllobothrium spp. parasite (47). Infection intensities between putatively planktivorous and benthivorous individuals were analysed using a Wilcoxon rank sum test in R.
Second, we performed a carbon and nitrogen stable isotope analysis on white muscle tissue to analyse the feeding behaviour of fish during the summer feeding period. The stable isotope analysis was performed as described in (20). Differences in stable isotope signatures between putatively planktivorous and benthivorous individuals were analysed using an ANOVA in R.
Chromosome-level assembly of the draft genome We created a draft chromosome-level assembly of the Arctic charr draft genome using a publicallyavailable genotyping-by-sequencing based high-density Arctic charr linkage map (48) and synteny between the Atlantic salmon and Arctic charr genomes. Ordered linkage map markers from the male and female linkage map were aligned to the draft genome using bowtie2 and the --very-sensitive setting. Markers with a mapping quality below 20 were subsequently removed. Furthermore, we analysed synteny between the Arctic charr draft genome contigs and the Atlantic salmon reference genome ((49), ICSASG_v2) using the software Chromosomer (50). Finally, we used mapped linkage map markers as well as the synteny results from Chromosomer as input for All-Maps (51) to create a chromosome-level assembly. The number of chromosomes were assigned based on the female linkage map. Furthermore, we used a weight of 2 for the female and male linkage map and a weight of 1 for the synteny results. We excluded the male linkage group 3 for the final assembly as it caused large misassembles.
Annotation of the draft genome To annotate this complex draft genome, we used a method realized in the GeMoMa 1.4.2 software (52). GeMoMa uses the homology-based prediction of genes with dynamic programming and intron position conservation supported by RNA-seq data to precisely predict genes in evolutionary related genomes. Intron positions were identified based on the STAR-aligned (53) bam files from 10 individuals (see below). Coding sequences were extracted from the NCBI RefSeq annotation of the salmon genome (49) and blasted (tblastn, (54)) against the charr genome in parallel with GNU PARALLEL (55). Next, genes were predicted using the CLI GeMoMa module with maximum allowed intron length of 70 kbp, a maximum number of five transcripts per gene and at least three split reads to infer an intron border position based on RNA-seq data. The gene predictions were further filtered using the GeMoMa Annotation Filter (GAF) with the following settings: relative score filter of 2, allowing non-complete gene predictions (due to the fragmented nature of the genome), common border filter of 0.2, and with all other settings set to default. The quality of the gene prediction was estimated using BUSCO v.1.22 (56) using the gene set mode (--m OGS) and the Actinopterygii database (actinopterygii_odb9) with 4584 mostly single-copy genes.
To find orthologs between Arctic charr and zebrafish (Danio rerio) genes (57) we used Orthofinder (58) and the longest isoforms of genes extracted from both species.
DNA extraction, library preparation and sequencing DNA was extracted from fin clips and muscle tissue using the NucleoSpin Tissue kit (Macherey-Nagel), following the manufacturers recommendations. DNA quality and quantity were assessed using agarose gel electrophoresis and the Qubit Fluorometer with the dsDNA BR Assay (Life Technologies). ddRADseq libraries were prepared using a modified version of the Recknagel et al. Amplification, sequencing and analysis of the mitochondrial ND1 gene The mitochondrial ND1 gene was amplified for 107 individuals (between 2 and 11 individuals per population) using the primer-pair B1NDF/B1NDF (59) and the PCR conditions as described in (59). The PCR product was cleaned and Sanger sequenced in both directions at DNA Sequencing and Services (MRC I PPU). Contigs were assembled from forward and reverse reads using Sequencher v. 5.4 (http://www.genecodes.com) after removing low quality reads and trimming read ends. Reads for all individuals were aligned using Muscle (60) in MEGA v. 7 (61) and trimmed to a common length. A TCS haplotype network was built in POPART (62).
Bioinformatic processing of ddRADseq data Raw sequence data quality was assessed using FastQC v.0.11.3 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The process_radtags pipeline in Stacks version 1.46 (63) was used for demultiplexing raw sequence data based on unique barcodes, quality filtering and read-trimming to 70bp of all libraries. Cleaned and trimmed reads were aligned to the Arctic charr draft genome with bwa mem v. 0.7.15 (64) using a seed length of 25bp and the -M option. Reads with a mapping quality below 20 were removed using samtools v. 1.6 (65). We further used Stacks v. 1.46 and the ref_map.pl pipeline for building of RAD-loci and SNP calling. The populations module was used to export genotype calls in VCF format for further filtering in vcftools v. 0.1.15 (66). We created three different datasets; a global dataset for the Atlantic and Siberian lineages combined, and a separate dataset for each lineage. SNPs were retained when the following criteria were fulfilled: (i) present in at least 66% of all individuals within a population and 2/3 of all populations, (ii) global minor allele frequency (MAF) ≥ 0.05 or ≥ 0.01 for the global dataset, (iii) heterozygosity ≤ 0.5 (iv) in Hardy-Weinberg equilibrium (P > 0.05) in at least 2/3 of all populations and (v) with a minimum coverage of 6x. Filtering and data conversion of data into the different formats was performed using Stacks, vcftools, PLINK version 1.90 (www.cog-genomics.org/plink/1.9/; (67)) and PGDspider v. 2.11.2 (68). Datasets for creating site frequency spectra were filtered in a similar way, except no MAF cut-off was used in order to retain informative low frequency sites and a maximum of 10% missing data within each dataset (e.g. each ecotype pair) were allowed. For all analysis, only one SNP per locus was retained to reduce the effect of linkage.
Summary statistics and analysis of population structure We calculated nucleotide diversity for each ecotype by population and all populations using vcftools. To assess population structuring across and within each lineage, and within lakes, we used a range of different approaches using the global SNP datasets and lineage-specific datasets. First, we used principal component analyses in adegenet (69,70) to assess major axes of genetic variation for the global dataset and for each lineage specific dataset. Second, Admixture v.1.3 (71) was used with a ten-fold crossvalidation to detect the most likely numbers of clusters and genetic ancestry proportions within lineages.
Genodive v.2.0b27 (72) was used to estimate pairwise genetic differentiation (Fst; (73)) between ecotypes within and across lakes using the global dataset, and significance was assessed using 10,000 permutations. We also assessed the relationship among populations within and across lineages using a neighbour-joining network as well as neighbour-joining tree based on uncorrected P distances using SplitsTree4 v.4.14.4 (74). Bootstrap support values for the neighbour joining tree were calculated using 100 bootstrap replicates.
Furthermore, to assess genetic co-ancestry among individuals within and across lakes, we used haplotype-based population inference approach implemented in fineRADstructure v.0.1 (75). We used the same filtering criteria as for the SNP dataset and performed the analyses using default settings. In brief, we calculated the co-ancestry matrix using RADpainter as implemented in the fineRADstructure package and subsequently inferred populations using a burn-in of 100,000 and 100,000 MCMC iterations. Finally, the results were visualized using the provided R-scripts (http://cichlid.gurdon.cam.ac.uk/fineRADstructure.html).
Analysis and visualization of introgression and differential admixture We first used Treemix version 1.13 (76) to explore and visualize secondary gene flow within each lineage. We build maximum likelihood trees only for non-admixed individuals (admixture threshold of 0.25 as inferred with Admixture) to reduce the effect of contemporary admixture on the inference of secondary gene flow. We fitted up to six and ten migration edges for the Atlantic and Siberian populations, respectively, and chose the strongest and most explanatory migration events based on the maximized variance explained, the maximum likelihood of the tree and the significance of each migration edge. To quantitatively test for secondary gene flow in the tree we used f4-statistics implemented in the fourpop function of Treemix. Furthermore, to formally test for differential admixture or introgressive hybridization in a three-taxon comparison (Target taxon C; Reference taxon A, Reference taxon B) we used f3-statistics as implemented in the threepop function in Treemix. Significant negative deviations from zero indicate the presence of differential admixture, e.g. in the form of introgressive hybridization, of two populations (e.g. A and B) in a third population (target population C).

Inference of evolutionary histories
To distinguish between alternative evolutionary scenarios leading the ecotype diversity within lakes, we used coalescence simulations implemented in fastimcoal2 v.2.5.2.3 (25) and information contained in the multidimensional site frequency spectraum (SFS). 2-population and 3-population SFSs were created using dadi v.1.6.3 (77) for each lake population and parapatric outgroups if appropriate. Populations were down-sampled by around 30% to reduce the effect of missing data (maximum 10% missing data in used SNP datasets). Since no trinucleotide substitution matrix is available for salmonids and no outgroup species was sequenced for ancestral state reconstruction and correction, the minor folded site frequency spectrum was used. To determine absolute values for divergence times and other inferred parameters, we corrected the number of monomorphic sites in the SFS as described in (78). Adjusting the number of monomorphic sites to the number of variant sites used to build the SFS is necessary to determine absolute parameter values, as only one SNP per RAD locus was retained for the analysis, skewing the ratio of monomorphic to variant sites (78). Furthermore, a mutation rate of 1x10 -8 was used as no accurate mutation rate for Arctic charr or salmonids is available (26).
For all sympatric ecotype pairs, seven pairwise demographic models describing different historical divergence scenarios were tested (fig. S17): Strict isolation (SI), Ancient migration (AM), Isolation-with-migration (IM), Secondary contact (SC), Secondary Contact with introgression (AdmSC), Isolation-with-migration with a historical change in migration rate (IMchange) and an IM-model with a historical introgression event sympatric species (IMint) were tested. In lakes with three sympatric ecotypes (Kamkanda and Kalarskii Davatchan), or strong admixture across lakes (Loch Dughaill and Loch Uaine), models describing different combinations of strict isolation, isolation-with-migration, secondary contact, introgression and hybrid speciation were tested for all three ecotypes/populations together ( fig S16). These models tested in general two different evolutionary histories. The isolation-withmigration models test a history of divergence under constant gene flow (with differing rates), whereas secondary contact models mainly test the occurrence of historical secondary contact between different distinct lineages or populations (e.g. distinct gene pools or glacial refugial populations) prior to ecotype divergence, and not necessarily multiple invasions.
We ran a total of 30 iterations for each model and lake, and selected the most likely model based on the AIC (25). Each run consisted of 40 rounds of parameter estimation with 100,000 coalescent simulations. Point estimates of inferred parameters were taken from the most likely model and averaged over the top five runs.
Outlier analysis, positive selection, and genome-wide patterns of differentiation To determine outlier loci between sympatric ecotypes, we screened the genome for loci showing genetic differentiation (Weir-Cockerham Fst calculated using vcftools) above the 95 th quantile of the Fst distribution (outlier loci). We estimated the number and proportion of pairwise shared outlier SNPs across benthivorous-planktivorous and piscivorous-planktivorous ecotype pairs, and identified potential loci under parallel selection. We also identified contigs containing outlier SNPs (but not the same SNPs across contigs) that are shared among ecotype pairs. In order to determine if more outlier SNPs or contigs are shared among two ecotype pairs than expected by chance, we resampled N number of SNPs (N = number of outlier SNPs for each ecotype pair) 10,000 times with replacement from the full dataset and determined the mean number of shared SNPs from that distribution. We used a Chi-square test to determine if more outlier SNPs/contigs are shared than expected. We also plotted the standardized Fst (ZFst) against delta nucleotide diversity (∆π = π ecotype2 -π ecotype1 ) between sympatric ecotypes, to determine if outlier loci show reduced genetic diversity in one or both ecotypes.
We used pcadapt 3.0.4 (79) to identify SNPs potentially under positive selection in each lineagespecific dataset. As the PCAdapt approach is based on detecting loci that separate along different principal components, we used the first 10 and 14 principal component axes for the Atlantic and Siberian populations, respectively, as these separate the different lakes and sympatric ecotypes within lakes. We considered loci with q-values of 0.01 or below (false discovery rate of below 1%) to be candidates for being under positive selection. P-values were transformed to q-values using the qvalues R-package.
Genome-wide association analysis To detect loci significantly associated with ecotype, we used two different approaches, a univariate linear mixed model (LMM) and a Bayesian sparse linear mixed model (BSLMM) implemented in GEMMA (80). BSLMM also calculates a set of hyperparameters describing the genetic architecture underlying trait (ecotype) variation. For these association analyses, ecotypes were coded numerically, with planktivorous as 0, benthivorous as 1 and piscivorous as 2. We excluded unimodal-planktivorous outgroup populations from Scotland and the ecotypes from Tokko from this analysis, and performed the association analysis separate for each lineage. As these approaches do not allow missing data in the dataset, we imputed missing data using the LD-kNNi method (81) implemented in Tassel5 (82). This method has been specifically designed and tested for RADseq data and is based on linkage to the k-nearest neighbours. We imputed missing genotypes based on the 10 closest, in terms of linkage and not genomic position, genotypes using the default settings. To test the imputation accuracy, we calculated the Pearson correlations between allele frequencies before and after imputation for the full dataset and the subset with the highest proportion of missing data. Genotype and phenotype data were formatted in Plink binary ped file format.
As the LMM and BSLMM approaches correct for the presence of population structure using a relatedness matrix, we calculated a centred relatedness matrix for each lineage using GEMMA. First, we used a linear mixed model to test for association between genotype and ecotype using the Wald test and report p-values. We used a conservative cut-off of a = 0.0001 and -log10(a) = 4 to determine statistical significance of associations (83). Second, we used a Bayesian sparse linear mixed model to determine associations of genotypes with ecotype and estimate the genetic architecture underlying ecotype divergence. In contrast to single-SNP association mapping approaches, such as the linear mixed model, BSLMM is a polygenic model that simultaneously assesses the association of multiple SNPs with trait variation and performs well for a wide range of genetic architectures underlying phenotypic variation, ranging from polygenic architectures to 'sparse' architectures with a few large effect loci underlying phenotypic variation (80). Furthermore, the BSLMM model also estimates a range of hyper-parameters describing the genetic architecture underlying phenotypic variation. We ran ten independent chains for the standard linear BSLMM, each with 25 million generations for the Atlantic dataset and 50 million generations for the Siberian lineage, excluding the first 5 and 25 million generations as burn-in for the Atlantic and Siberian dataset, respectively. We assessed convergence of the runs based on the mixing of the hyper-parameters. We estimated the posterior inclusion probability (PIP) for each SNP, the proportion of generations a SNP has an effect on phenotype, by averaging across the ten independent chains. We used a stringent cut-off of PIP > 0.1 to identify candidate SNPs associated with ecotype. Furthermore, we summarized hyper-parameters describing the genetic architecture underling ecotype divergence within each lineage by calculating the mean and 95% confidence intervals for the parameter posterior distributions across all independent chains.
In order to assess if allele frequencies of candidate loci are similar across parallel ecotypes within lineages, we first performed principal component analyses using BSLMM candidate loci for all lakes within each lineage. Subsequently, we performed two-way ANOVAs on individual scores for each principal component separately (PC1-5 in Atlantic and PC1-9 in Siberian lineage), with ecotype, lake and ecotype-lake interaction as variables. Similar to the phenotypic analysis, we then estimated the effect size (h 2 ) of each variable on the genetic variation along each significant principal component using the BaylorEdPsych R-package.
Annotation and functional characterization of Fst outlier loci and GWAS candidate loci We identified all annotated genes containing ecotype associated loci (LMM and BSLMM) and those under selection (PCAdapt) and the closest gene within 10 kb of those SNPs. We used all genes with zebrafish orthologues (one-to-one or one-to-two in zebrafish) for an Overrepresentation Enrichment Analyses (ORA) in Webgestalt (84), using the zebrafish genome annotation as background. Identified KEGG pathways, both significantly enriched and none-enriched, were then compared across ecotype pairs to identify parallel pathways. Pathways were identified as significantly enriched with a falsediscovery rate (FDR) below 0.1.
Correlation of genetic divergence, phenotypic divergence and ecosystem size Identifying the drivers of phenotypic and genetic divergence is important for understanding the processes that lead to deviations from parallelism or even parallelism itself. We tested for correlations between the degree of phenotypic differentiation (Pst), genetic divergence (mean neutral Fst and mean outlier Fst) and ecosystem size (PC1), as a proxy for ecological opportunity (24), for all ecotype pairs combined and for benthivorous -planktivorous and piscivorous -planktivorous pairs separately (genetic divergence vs. phenotypic divergence, genetic divergence vs. ecosystem size, phenotypic divergence vs. ecosystem size) using Pearson's correlation (cor.test function in R). We used the first principal component from a PCA performed using prcomp based on maximum lake depth and surface area, estimated using Google Earth Pro, as a proxy for ecosystem size (24). Pst values between all sympatric ecotype pairs were calculated for each trait and the mean was calculated across trait-specific Pst-values to estimate the proportion of phenotypic variance among ecotypes relative to the total variance (4) using the Pstat R-package. Pst values are unit-less and directly analogous to Fst (4). We corrected for multiple testing using a Benjamini-Hochberg correction implemented in the R-package p-adjust. However, due to the relatively small number of data points (N = 14) and the large number of pairwise comparisons, we were primarily interested in the strength of correlation rather than p-value significance, treating Spearman correlations (rho) over 0.5 as relevant for hypothesis generation and further analyses (4).
RNA extraction, sequencing and alignment RNA was extracted from white muscle tissue from 44 individuals (N = 4 per ecotype per lake) from five different lakes (Awe, Tay, Dug, Kam, Tok), representing all possible ecotypes (benthivorous, planktivorous, insectivorous, piscivorous), using PureLink RNA Mini Kits (Life Technologies, Carlsbad, CA). Extractions were mostly carried out following the manufacturer's instructions, with an additional homogenization step using a FastPrep-24 (MP Biomedicals), before RNA extraction (99). RNA was quantified using the RNA HS Assay kit for the Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA) and quality was assessed using the 2200 Tapestation using the High Sensitivity RNA ScreenTape (Agilent, Santa Clara, CA). Extracted RNA was of high quality, yielding A260/280 ratios between 1. Raw reads were processed using Scythe v0.9944 BETA (https://github.com/vsbuffalo/scythe/) and Trimmomatic v0.36 (85) to remove any adapters, ambiguous bases and low-quality sequences prior to alignment. Leading and trailing bases with a Phred quality score below 20 were removed and a sliding window approach (4 bp window size) was used to trim reads at positions with Phred scores below 20. A minimum read length of 50 bp was allowed after trimming. We used FastQC v0.11.2 to assess readquality before and after processing. Processing removed 2% of all reads, resulting in a dataset of 1.81 billion cleaned reads. The resulting reads were aligned against the Arctic charr draft genome using STAR v2.5.2b, with default parameters.
Differential gene expression analysis To identify differentially expressed genes, reads were counted for each gene based on the longest isoform annotations using the HTSEQ-count python package (86) with the unstranded (--stranded=no), CDSbased (--type=CDS, --idattr=Parent) settings. The read count table was used for the differential gene expression analysis between sympatric ecotypes using DESeq2 package (87) on a per lake basis. Only genes with at least 20 read counts per lake were used.

Network analysis
To further examine the functional bases of trophic divergence Arctic charr, we used a Weighted Gene Co-Expression Network Analysis (WGCNA) to identify modules of co-expressed genes (88). Network analyses were only performed on benthivorous-planktivorous ecotype pairs (from Awe, Tay, Dug and Kam, n=32, eight per lake, four per ecotype within lake), given that sample sizes for insectivorous and piscivorous ecotypes were too low (n = 4 per ecotype) to robustly identify co-expressed network modules for these ecotypes. Prior to further analyses, count data were transformed to a log 2 scale using the rlog function in DESeq2.
To reduce stochastic background noise in our expression data resulting from lake-specific effects, we used the R package variancePartition v3.6 (89) to identify a subset of genes with expression variation attributed to ecotype. VariancePartition applies a linear mixed model (LMM) to partition the expression variance for each gene into the percentage of expression variation explained by 'lake' and 'ecotype' effects. All remaining expression variation not attributable to these factors was termed residual variation. Batch effects were removed from expression data using an initial LMM, with 'batch' as a random effect, as per the varianceParition pipeline (89). The residuals from the initial model were then used as the response variable for a subsequent LMM, with 'lake' and 'ecotype' specified as explanatory variables; 'lake' and 'ecotype' represent categorical variables and were thus modelled as random effects. VariancePartition assesses each gene individually and the results for all genes are aggregated afterwards (89). All genes for which 'ecotype' explained more than 10% of the total expression variation across individuals were used for network construction.
A single network was constructed for benthivorous and planktivorous individuals from the set 1,512 ecotype-associated genes. Following the standard procedure of WGCNA, pair-wise Pearson's correlations were calculated for all genes and raised to a power 12 to generate the adjacency matrix. A topological overlap matrix (TOM) was then calculated from the adjacency matrix to measure network interconnectedness (the strength of a co-expression relationship between any two genes with respect to all other genes in the network; (90)). Genes with highly similar co-expression relationships were clustered using average linkage hierarchical clustering of the topological overlap dissimilarity measure (1-TOM). Finally, network modules were defined using the dynamic treecut algorithm, with a minimum module size of 25 genes and a cut height of 0.992 (91). The module eigengene distance threshold was set to 0.25 to merge highly similar modules. Modules of co-expressed genes were labelled by color (assigned arbitrarily by WGCNA).
To determine the significance of module-trait relationships, Spearman's correlations were calculated between module eigengenes (defined as the first principal component of the expression profile for a given module) and trait measurements (lake and ecotype). Module -trait relationships were considered significant when p-values were <0.05 after Benjamini-Hochberg correction. The direction of correlation (i.e. positive or negative) determined whether modules were associated with expression in benthivorous or planktivorous ecotypes. Trait values for ecotype were specified in binary format (planktivorous = 0 and benthivorous = 1), therefore positive correlation represent up-regulation in benthivorous ecotypes (over planktivorous), and negative correlations represent up-regulation in planktivorous ecotypes (over benthivorous).
Functional characterization of differentially expressed genes To find genetic pathways associated with ecotype divergence in Arctic charr, we performed functional enrichment analyses on differentially expressed genes (all pair-wise comparisons of ecotype, across all five lakes) and co-expressed gene modules (benthivorous -planktivorous ecotype comparisons, form Awe, Tay, Dug and Kam), using the web interface of the WebGestalt tool (84). For differentially expressed gene sets we used Gene Set Enrichment Analyses (GSEA; (92)) to identify enriched between different ecotypes. Before the analysis, the DESeq2 raw p-values for each lake-ecotype comparison were modified to create a ranked list of genes using the following formula: -log10(p-value)*log2FoldChange. To identify enriched pathways within co-expression modules we performed an Overrepresentation Analyses (ORA) for each module individually.
For the GSEA and ORA analyses, we used only charr genes which had 1:1 or 2:1 orthologs in the zebrafish genome. The analysis was performed with the following settings: minimum number of genes for a category = 5, maximum number of genes for a category = 500, number of permutations = 1000, number of categories with leading-edge genes = 20, KEGG pathways (93), organism = Danio rerio.
Gene sharing between comparisons. To calculate an expected number of the shared DEGs between the comparisons we used a permutationbased approach, similar to the outlier comparison (see), with 10,000 permutations. We randomly sampled n genes (n = the number of DEGs in a comparison) 10,000 times from each dataset with replacement and calculated the expected number of shared DEGs as the mean number of shared resampled genes in each comparison. We calculated proportion-based p-values based on the number of observed shared DEGs, expected DEGs and the total number of genes for each pairwise comparison using R function prop.test.

Supplementary Results
Ecological divergence in Loch na Sealga Arctic charr from Loch na Sealga that were classified as benthivorous and planktivorous based on stomach content were strongly differentiated in the intensity of Diphyllobothrium spp. infections ( fig.  S3B; Wilcoxon rank sum test: P < 0.001), indicating strong segregation in trophic niche (47). Furthermore, the stable isotope analyses revealed that planktivorous and benthivorous ecotypes displayed divergent nitrogen signatures ( fig. S4A; F 1, 62 = 20.65, P < 0.001). Planktivorous and benthivorous also differed in the length of the lower jaw ( fig. S4; table S2; Pst LJL = 0.783) and fork length ( fig. S4; table S2; Pst SL = 0.967). These results indicate the presence of ecologically and morphologically distinct ecotypes, although this divergence is relatively subtle in most aspects.
Continuum of eco-morphological divergence across lakes We used seven linear morphological traits and fork length to analyze the extent of morphological divergence between ecotypes within and across lakes and lineages. Allometric effects cannot be fully removed and body size will in some cases have an effect on the downstream analysis, where trait variation and body size variation are strongly correlated. However, the fact that the allometric correction fully removed the effect of body size for most traits, such as head depth at eye (Spearman test: rho = -0.002, P = 0.94), head length (Spearman test: rho =-0.05, P = 0.052), and lower jaw length (Spearman test: rho = -0.039, P = 0.1251), and that other traits only display very weak although significant (P < 0.05) correlations with individual fork length, i.e. eye diameter (rho = -0.29) and maxilla length (rho = -0.10), suggests that the effect of body size on downstream analyses are negligible (35).
We found that the main axis of morphological variation in the Atlantic lineage primarily separated the different bimodal populations (figs. S5 and S6). Sympatric (benthivorous and planktivorous) ecotypes split along PC2 in Awe, Tay and na Sealga but mainly along PC1 in Dughaill. In the Siberian lineage, however, the main axis of morphological variation (PC1) differentiated benthivorous and planktivorous ecotypes, followed by the split of planktivorous and piscivorous ecotypes along PC2 (figs. S5 and S6). Furthermore, the insectivorous ecotype in Tokko separated from the benthivorous and planktivorous ecotypes along PC1, but also split from all other ecotypes along PC3 (figs. S5 and S6).
The canonical analysis of variance for both lineages combined showed that individuals clustered by ecotype, even across lineages, but with substantial overlap in morphospace (Fig 1C, fig. S7). The average assignment success of individuals to the respective ecotype across lakes was 70.5%. The overall degree of divergence between sympatric and allopatric ecotypes differed across comparisons (measured as mahalanobis distance), and in several cases trophically similar allopatric ecotypes could not be significantly differentiated based on morphology, for example the planktivorous ecotype from Bol'shoe Leprindo from the planktivorous ecotypes in Kamkanda and Kiryalta-3 or the piscivorous ecotype in Kudushkit from the piscivorous ecotypes in Kamkanda or Kalarskii Davatchan ( fig. S8). However, all sympatric ecotypes, except in na Sealga (Mahalanobis distance = 1.295, p = 0.1), could be significantly distinguished based on overall morphology. Overall, we found that the level of morphological divergence varied strongly across lakes but that sympatric ecotype pairs were significantly diverged, at least in one or more traits.
Geometric morphometrics of body shape In addition to linear traits, we also examined the level of divergence and parallelism in body shape using geometric morphometric analysis on the four Atlantic benthivorous -planktivorous ecotype pairs (Tay, Dughaill, Awe, na Sealga). The principal component analysis showed that all sympatric ecotypes split along PC1 (38.2%), and to varying degrees along PC2 (20.3%) (Fig. 1A). Most of the variance in body shape along PC1 and PC2 was explained by differences in head and eye size compared to the rest of the body, as well as body depth (Fig. 1A). Benthivorous ecotypes had in general larger heads and bigger eyes in relation to their body size.
To test for differences in body shape among ecotypes and lakes, we first analyzed all ecotype pairs combined and compared differences in allometric curves among sympatric ecotypes, as allometric curves were not homogenous across lakes (F = 1.576, Z = 5.67, p = 0.001). We found that sympatric ecotypes in Awe (Z = 5.4, p = 0.003), Tay (Z = 5.6, p = 0.006) and Dughaill (Z = 2.2, p = 0.044) significantly differed in the angle of allometric slopes but not distances, meaning that shape changes differently among these ecotypes during ontogeny (table S5). Allometric curves did not differ between ecotypes in Loch na Sealga (table S5; Z = -0.38, p = 0.569). Using a similar approach to the analysis of linear trait, separate ANCOVAs for each principal component showed that most of the phenotypic trait variation (PC1) was explained by the parallel ecotype effect (table S6: h 2 PC1, Ecotype = 0.486), even more than the effect of evolutionary history (h 2 PC1, Lake = 0.391). However, the non-parallel 'lake term' and 'ecotype x lake interaction' term explained most of the variation along PC2 to PC5, suggesting that the head region and body depth show the strongest level of parallelism across populations (Fig. 1A). To further explore the level of parallelism in overall body shape, we used a phenotypic trajectory analysis (PTA) (43), which takes the full phenotypic variance for all PCs into account. The PTA revealed that Tay and na Sealga are parallel in the angle of phenotypic trajectories (table S7; (table S7; magnitude of change). This shows that some populations are more parallel in the direction of change and others more in the magnitude of change. Overall, this demonstrates that even though that most ecotype pairs are not fully parallel in body shape, that certain aspects of their shape, mainly in the head region, show significant parallelism across lakes.
Genome annotation, orthology inference and chromosome assembly.
We were able to assemble 62.6% of the draft genome (1.346 Gb) into 39 Arctic charr chromosomes (48) with a total of 26.7% (0.573 Gb) being oriented. 37.4 % (0.804 Gb) of all contigs could not be anchored to the linkage map.
We annotated 44102 genes with 55443 alternative isoforms using homology-based GeMoMa gene prediction software. The BUSCO analysis estimated that the annotation contained 88% of completely assembled BUSCOs and 3.4% of fragmented BUSCOs, while 7.6% of BUSCOs were missing. Among the 88% completely assembled BUSCOs, 62% were duplicated, which is consistent with the estimation for the salmon genome (49,100).
Orthofinder returned 15485 orthogroups with a mean and median orthogroup size equal to 2.7 and 2, respectively, which can be explained by the relatively recent whole genome duplication in salmonids (49). We detected 8428 single-copy orthogoups.
Genetic clustering and shared ancestry Using a combined (12,215 SNPs) and lineage-specific (Siberia: 4475 SNPs; Atlantic: 6,039 SNPs) SNP datasets, we analyzed the major axes of genetic variance, patterns of genetic clustering and shared ancestry. We used principal component analyses and individual clustering analyses to infer the major axes of genetic variation, genetic ancestry within and across populations and infer potential modes of speciation.
The combined principal component analysis revealed a main clustering of populations by lineage and then catchment within each evolutionary lineage (figs. S11 and S12). Under the scenario of a single colonization event of each lake and a subsequent intralacustrine diversification, we would expect the divergence along independent principal components within each population and that the resulting ecotypes would be equidistant to all other outgroup populations (94,95). However, contrary to our expectations we found that ecotypes in several intralacustrine radiations did not diverge along independent principal components but often split along non-independent axes and share varying degrees of ancestry with other populations (fig. S12). In Kiryalta-3 and Kiryalta-4, populations split along the same principal components (PC9 and PC11) and clustered by ecotype ( fig. S12). Similarly, in Dughaill and Uaine, we observed that the planktivorous Dughaill ecotype is equidistant to the benthivorous Dughaill ecotype and the unimodal population from Uaine. This was supported by significant f3-statistic, showing significant introgression from the Uaine population and the benthivorous Dughaill ecotype (or ancestral populations) into the planktivorous ecotype (table S10). This significant signals of selection was also detected based on allopatric comparisons (Tay-bn:Uai or Awe-pl:Uai), suggesting that introgression occurred through an ancestral lineage that is genetically similar to the benthivorous Dughaill ecotype and southern Scottish populations (i.e. Tay, Awe, Eck). This is supported by considerable sharing of mitochondrial haplotypes between those populations. In other cases, this genetic clustering is less extreme but differences in shared genetic ancestry could still be observed. For example, the planktivorous ecotype in Tay is genetically closer to the Loch Awe population than the benthivorous Tay ecotype, and the planktivorous ecotype from Davatchan is genetically intermediate between the benthivorous Davatchan ecotype and the (Bol'shoe and Maloe) Leprindo populations (figs. S11 and S12). In general, these scenarios of secondary gene flow across populations were supported by significant f4-statistics for almost all populations (table S11), indicating that gene flow was prevalent across populations.
However, ecotypes in Kamkanda and Kalarskii Davatchan split along independent principal components (Kam: PC10, KDa: PC8), suggesting their divergence in sympatry (94). The benthivorous ecotype in Kamkanda is intermediate and equidistant to the piscivorous and planktivorous ecotypes, supporting a divergence in sympatry with similar levels of gene flow between ecotypes. In Kalarskii Davatchan, the small-piscivorous ecotype is placed intermediate to the large-piscivorous and the planktivorous ecotype, but genetically more similar to the large-piscivorous ecotype, supporting the results of the f3-statistics (table S10), suggesting introgression into the small-piscivorous ecotype from the other sympatric ecotypes or historical populations, because signals of introgression were also significant for allopatric and more distant populations (the piscivorous Kudushkit and benthivorous Tokko ecotypes).
The analyses of genetic co-ancestry (figs. S13 and S14) revealed varying levels of shared genetic co-ancestry across populations and ecotypes, with mostly higher levels of shared co-ancestry within catchments (e.g. Dughaill and Uaine or Tay and Rannoch; fig. S13) but also in some cases across more distant populations (e.g. the planktivorous ecotype in na Sealga showed higher genetic co-ancestry with Eck than the benthivorous na Sealga ecotype). Similar patterns were observed for the Siberian populations (fig. S14), with for example Tokko sharing different levels of genetic co-ancestry with Kamkanda and Kalarskii Davatchan.
Thus, these combined analyses of genetic clustering and shared co-ancestry suggest a complex history of ancestral admixture and subsequent ecotype divergence.
Demographic models To test whether sympatric ecotypes evolved under a scenario of secondary contact (historical contact and ancestral admixture of different lineages or gene pools) or completely in the presence of gene flow only between the contemporary sympatric ecotypes, we used a demographic modelling approach based on the folded joint allele frequency spectrum. We found that the most likely demographic history of most ecotype pairs in the Atlantic and Siberian lineage include signs of historical secondary contact and/or introgression (table S12). These models tested that ancestral populations or lineages came into secondary contact before or while colonizing newly formed postglacial lakes, presumably following the retraction of Pleistocene glaciers (17,96). These admixed populations most likely subsequently diverged phenotypically in sympatry due to disruptive selection.
We found that the contemporary genetic diversity in most Siberian populations most likely originated under a secondary contact scenario, in which distinct ancestral populations (two or more) came into secondary contact after the retraction of Pleistocene glaciers ( fig. S18, table S12). The mean divergence time among ancestral populations was 8180 generations, which translates into around 24,450 -81,800 years for generation times of 3 -10 years (interval of the first maturation of different ecotypes) ((27), personal observations), and a mean timing of introgression around 861 generations (2,583 -8,610) years ago (fig. S18). The demographic modelling approach could not fully resolve the evolutionary history of ecotypes in Kudushkit. Although the secondary contact model (SCint) in Kudushkit is more likely than the model of isolation-with-migration and introgression (IMint) (∆AIC = 4.47), we inferred a recent divergence time between genetic clusters of 1,580 generations ( fig. S18), which corresponds into a divergence time of 9,480 -20,540 (mean = 15,010) years (mean generation time of 9.5 years, ~6 years in the planktivorous and ~13years in the piscivorous ecotype; (33)).
In the Atlantic lineage, the most likely model of divergence among sympatric ecotypes was the isolation-with-migration and introgression (IMint) model, in which ancestral populations and/or ecotypes diverged under gene flow and experienced an introgression event at some point in time with a change in migration rate (fig. S18). However, inferred divergence times and therefore the interpretation of the evolutionary history differed strongly across lakes. In Dughaill and Tay, ecotypes most likely diverged prior to the last glacial maximum around 20,699 and 10,931 generations ago (62,097 and 32,793 years with a 3-year generation time), respectively, with a respective timing of secondary contact around 90 and 1176 generations ago (270 -3,528 years). Although gene flow could be detected between the postglacial lineages during the last glacial maximum, the rate of gene flow was around one to two magnitudes lower before the secondary contact compared to contemporary gene flow ( fig. S18). In contrast, ecotypes from Awe and na Sealga genetically diverged around 1,245 and 4,795 generations ago (3,735 -14,385 years; 3year generation time), so most likely after the last glacial maximum. Introgression in Awe and na Sealga occurred around 0-10 generations ago, meaning the model effectively converged into an isolation-withmigration model. However, it could also be caused by very recent introgression into those populations from an external population. In Awe and na Sealga, the inferred rates of gene flow prior to introgression (m avg.prior,Awe = 3.75e -3 ; m avg.prior,naS = 3.18e -3 ) were stronger compared to recent gene flow after introgression (m mean.post,Awe = 1.53e -9 ; m mean.post,naS = 5.48e -4 ).
In some cases, we cannot significantly distinguish competing models with a threshold of ∆AIC of 4, mainly because of limited information contained in the site frequency spectrum and complex evolutionary histories. Therefore, other modes of speciation or variations of the tested modes are also possible (table S12). However, in most cases the next best model is a variation of the best fitting model, e.g. secondary contact without introgression (SC) rather than with introgression (SCint).
As pairwise comparisons in systems with more than two ecotypes are prone to underestimate gene flow and misinterpret scenarios of introgression and gene flow, we tested between six to eight different models for each system with three ecotypes (Kamkanda, Kalarskii Davatchan and Dughaill-Uaine) using different combinations of the pairwise models ( fig. S17). Due to the significant f3 statistics for the small-piscivorous Kalarskii Davatchan ecotype (KDa-pisc-s) and the planktivorous Dughaill ecotype (Dug-pl), suggesting a significant history of introgressive hybridization, we also tested models describing introgression and hybrid speciation in those two systems.
In the case of Loch Dughaill and Uaine, the most likely model was the IMint model (∆AIC to next best model = 89.32; table S12, fig. S18), similar to the other populations from the Atlantic lineage. Under this model, the oldest split between the benthivorous Dughaill ecotype and the ancestor of the planktivorous Dughaill ecotype and Uaine population occurred before the last glacial maximum, 16,690 generations ago, which is about 50,000 years with a generation time of three years. The planktivorous ecotype from Dughaill and the Loch Uaine split after the LGM around 3,954 generations ago (~12,000 years ago). We also detected a recent introgression from the benthivorous Dughaill ecotype and Uaine into the planktivorous Dughaill ecotype, with nearly complete admixture from Uaine into the planktivorous Dughaill ecotype (adm = 1.00). Only weak admixture occurred between the benthivorous and planktivorous ecotypes in Dughaill (adm = 0.003).
Both Siberian populations, Kalarskii Davatchan (KDa) and Kamkanda (Kam) most likely evolved under an evolutionary history of secondary contact scenario (SC) without significant introgression (table S12, fig. S18). This contradicts the f3-statistics that showed significant introgressive hybridization in KDa. However, the next best model in KDa is the IntSC model that includes introgression and is also weakly supported (∆AIC = 5.27), so the possibility of introgressive hybridization cannot be fully excluded. In Kamkanda we cannot exclude the possibility of isolation-with-migration as the mode of divergence, as the IM model with ongoing gene flow during divergence was also highly supported compared to the SC model (∆AIC = 0.32). In both cases, though, the oldest split most likely occurred before the LGM (Kam: 10,716 generations; KDa: 3,783 generations). Generation times of piscivorous ecotypes in Siberian lakes are significantly higher compared to Atlantic and other Siberian ecotypes (10 years and older), translating it into divergence times of up to more than 107,160 years in Kamkanda (mean = 80,370 with a mean generation time across ecotypes of roughly 7.5 years, (35) and up to more than 37,830 in Kalarskii Davatchan (mean = 28,373 years with a mean generation time across ecotypes of roughly 7.5 years, (33).
Genomic patterns of differentiation We compared genomic patterns of relative differentiation (Weir & Cockerham Fst) and genetic diversity (π) across ecotype pairs to identify differences and similarities in genome-wide patterns, particularly among ecotype pairs with different evolutionary histories. Patterns of ∆π (π (ecotype1) -π (ecotype2)) plotted against the standardized Fst (ZFst) differed across ecotype pairs, with ecotype pairs that most likely diverged after the last glacial maximum (Awe, na Sealga) showing strong differential within-group diversity (∆π ≠ 0) at outlier loci (ZFst > 4) compared to ecotype pairs that most likely diverged before the last glacial maximum after secondary contact ( fig. S21). In Kudushkit, the pattern of genetic differentiation is intermediate and does not clearly support one mode of speciation, similar to the demographic modeling results, which suggest a recent (post-LGM) divergence (figs. S18 and S21).
Ecotype pairs that show strong support for divergence under secondary contact also show high average genome-wide differentiation with several fixed variants across the genome, ranging from a maximum of 76 fixed SNPs in Davatchan (3.33% of the genome) to a minimum of 1 fixed SNP in Kudushkit and in the planktivorous-piscivorous ecotype pair in Kamkanda (table S8). In general, the number of fixed differences is higher in the Siberian lineage (mean number of fixed SNPs = 16.4) compared to Atlantic populations (mean number of fixed SNPs = 1).

Sharing of Fst outlier loci on Contig985
In context of the outlier SNPs shared across replicates, Contig985, which is located on LG1, contained outlier loci in five out of six benthivorous-planktivorous ecotype pairs in a 4.2 Mb region, and was only significantly differentiated in one of the planktivorous-piscivorous ecotype pairs. Genes in this region are involved in a range of pathways (dre00512, dre00030, dre00052, dre00052, dre00010, dre03018, dre01230, dre04070, dre01200, dre04110) of which several are also enriched for differentially expressed genes (Fructose and mannose metabolism dre00052; Biosynthesis of amino acids dre01230 and Glycolysis / Gluconeogenesis dre00010, table S16) or modules of co-expressed genes (Phosphatidylinositol signaling system dre04070, table S19).
Correlation of phenotype, genotype and environment To gain insights into the potential drivers of phenotypic and genetic divergence among sympatric ecotypes, we investigated correlations of within-and between-group genetic diversity (π, Fst), phenotypic divergence (Pst for each trait and mean across traits) and ecological opportunity (PC1 for ecosystem size). We found a significant correlation between the degree of genetic differentiation (calculated based on outlier loci, Fst above 95 th quantile) and mean phenotypic divergence (mean Pst) (table S9; Pearson's r = 0.695, P = 0.038) and for pectoral fin length (Pst-PFL) and genetic differentiation (table S9; Pearson's r = 0.699, P = 0.038). Neutral genetic differentiation was also correlated with phenotypic divergence, but less strongly and was not statistically significant after correction for multiple testing. Variation in the mean phenotypic divergence among sympatric ecotypes may be explained by variation in ecosystem size, a proxy for ecological opportunity (24), although the correlation was not significant (Pearson's r = 0.609, p = 0.077; table S9). However, our dataset was not specifically tested for this kind of analysis and these results should be observed as hypothesis creation rather than strict hypothesis testing.
Signals of selection and ecotype-association As a complementary approach to the Fst outlier scans, we also identified loci under selection in each evolutionary lineage and loci associated with ecotype. 599 (out of 6,039) and 666 (out of 4,475) SNPs were identified to being under positive selection in the Atlantic and Siberian lineage, respectively (fig. S20). These loci were associated with the first 10 and 14 axes of genetic variation in the Atlantic and Siberian, respectively, and were under selection within and across populations. Loci under selection were widely distributed across the genome in both lineages, although some clusters could be observed ( fig.  S20). Only 18 loci under selection were shared across lineages (expected number of shared loci by chance: mean = 14, 95%-CI = 8 -22). Contig985 also contained loci (but not the same SNPs) under selection in both lineages. Genes associated with loci under selection were involved in a wide range of different biological functions (table S21), but none of the GO-terms or KEGG pathways were significantly enriched. Cell differentiation (GO:0030154) is the only biological process (GO-term) that was shared among lineages, although similar functions were under selection, such as cellular and developmental processes, neural development or signalling pathways.
Furthermore, we performed genome-wide association analysis using LMM and BSLMM approaches based on lineage-specific imputed SNP datasets. Correlation between non-imputed and imputed SNP datasets (allele frequencies for SNPs with more than 20% missing data) of 0.92 -0.97 (>0.98 for all SNPs), suggests the imputation did not introduce a bias in allele frequencies. We identified a total of 35 and 228 candidate loci that were associated with ecotype in the Atlantic and Siberian lineage, respectively ( fig. S22). 8 (Atlantic) and 11 (Siberia) of those loci were identified using both methods, the univariate LMM and the BSLMM. Genes associated with ecotype in the Siberian lineage (table S20) were significantly enriched for genes involved in habenula (GO:0021986) and epithalamus (GO:0021538) development (table S18, FDR < 0.1). Genes involved in ECM-receptor interaction (dre04512) were colocalised with loci under selection and loci associated with ecotype divergence in the Siberian lineage. None of the pathways in Atlantic populations were significantly enriched (table S20) but for example the adrenergic signaling in cardiomyocytes (dre04261) KEGG pathway was under selection in Siberian populations (table S20 and S21).
Two and 51 of the candidate loci associated with ecotype were also detected to be under selection in the Atlantic or Siberian population, respectively, and even though none of the candidate loci were directly shared across lineages, we found that Contig1512 contained loci associated with ecotype and under selection in both lineages. Genes located on this contig were significantly enriched for functions related to the ErbB signalising pathway (dre04012). ErbB receptor tyrosine kinases link the binding of extracellular growth factor ligands to intracellular pathways, regulating processes such as cell proliferation, differentiation or apoptosis (97), and these pathways are suggested to play an important role in skeletogenesis (98).
Genetic architecture underlying ecotype divergence Hyper-parameters inferred using the BLSMM were used to investigate the genetic architecture underlying ecotype evolution in the Atlantic and Siberian lineages (table S22). We found that a large proportion of phenotypic variation is explained by the genotype (PVE), ranging from 0.9676 (95%-CI: 0.8617-0.9999) in the Atlantic lineage to 0.9992 (95%-CI: 0.9967 -0.9999) in the Siberian lineage. However, in the Atlantic lineage only 63.1% of PVE was explained by 51 sparse (major-) effect loci (n.gamma), whereas in Siberia nearly all of the PVE (PGE = 96.8%) was explained by loci with sparse effect (n.gamma = 91). Overall, large proportions of ecotype divergence were explained by genotype in both lineages.
RNA-seq data processing and alignment. 79.17 % ± 0.747 (mean ± SEM) of RNA-seq reads could be uniquely mapped to the Artic charr draft genome. The majority of the unmapped reads could not be mapped because they were too short after the trimming step. In total (with multiple mapping) between 15 and 37 million reads per individual were mapped.
We detected a wide range in the extent of differential gene expression between ecotypes, from complete separation in the PCA (i.e. Dughaill in fig. S24) up to relatively strong overlap among ecotypes (i.e. in Awe; fig. S24). In Kamkanda, we find a similar relationship between ecotypes based on gene expression compared to SNP data, with the benthivorous and piscivorous ecotypes being more similar to each other compared to the planktivorous ecotype, which splits of along PC1 from both other ecotypes. Ecotype pairs from different populations also showed very high overlap in differentially expressed genes (DEGs) and nearly all pairwise comparisons were significant (table S14). The only two non-significant pairwise comparisons were across non-parallel ecotypes pairs, comparing the benthivorous -piscivorous ecotype pair from Kamkanda to the benthivorous -planktivorous ecotypes from Tay and Awe. We further found a range of differentially expressed genes that were shared between three (122 DEGs), four (19 DEGs) and five (4 DEGs) out of seven ecotype pairs (table S15 and S18). Out of the 19 DEGs that were differentially expressed in four ecotype pairs, seven transcripts belonged to different haemoglobin alpha and beta subunits (fig. S26). These HBA and HBB subunits were all expressed in the same direction between ecotypes within populations but not always in the same direction across ecotypes. Other shared genes include heat shock proteins HSP16.1, a growth inhibitor (as-related and estrogen-regulated growth inhibitor; Fig. 3E) or mid1-interacting protein 1-B-like (table S18). Four genes were shared across five ecotype pairs, of which three were characterized (Fig. 3C, table S18). The fourth transcript belonged to an uncharacterized gene (table S18). In general, these genes showed comparable expression patterns across populations.
Furthermore, we detected 13 modules containing co-expressed genes that were also correlated with ecotype (benthivorous or planktivorous) across populations ( fig. S25, table S16). Genes within four modules were enriched for a range of different functional pathways (KEGG; table S19) related to growth and development (TGF-beta signalling pathway, ECM-receptor interaction) or metabolism (Glycerolipid metabolism).  :   Fig. S1. Map of sampling sites in the Atlantic lineage (Scotland) and the Siberian lineage (Transbaikalia). Exact coordinates, population abbreviations and sample sizes are given in table S1. The following colour code was used: orange -benthivorous, blue -planktivorous, greenpiscivorous, dark red -insectivorous, grey -unimodal-planktivorous/unassigned.                   In the Siberian lineage, PC1 to PC4 explain large proportions of the genetic variation, more than the non-parallel ecotype x lake interaction, but never more than the unique evolutionary history of each population. Results of the statistical analysis are given in table S13.   Tay  Table S1. Sampling sites and sample sizes for populations used for phenotypic and genomic analysis of Arctic charr from the Atlantic and Siberian lineage.              Table S22. Hyperparameters describing the genetic architecture underlying ecotype divergence in each lineage inferred using the BSLMM in GEMMA. Given are the mean and 95%-confidence interval (CI) for each hyperparameter.