Introduction

Genetic diversity is critical to all biodiversity because it provides the basis for populations to genetically adapt and survive in rapidly changing environments (Pauls et al. 2013). Maintaining this diversity in wild species, through large, temporally stable populations that are well-connected to each other via genetic exchange (gene flow), maximizes adaptive potential and resilience towards environmental perturbations e.g., climate change (Des Roches et al. 2021). Mapping and monitoring genetic variation to gain knowledge on spatio-temporal variability patterns and the processes shaping them is of fundamental importance (Waples 1989; Laikre et al. 2008; Schindler et al. 2010, 2015).

The need for monitoring genetic diversity has long been highlighted in science (Ryman and Ståhl 1981; Schwartz et al. 2007) as well as in policy; the most important international policy for biodiversity – the Convention on Biological Diversity (CBD; www.cbd.int)—recognizes the need for monitoring biological variation at the levels of genes, species and ecosystems. However, conservation programs that include monitoring at the gene level are still largely missing, and implementing the CBD for genetic diversity has lagged behind other levels of biodiversity, in spite of extensive scientific knowledge on how to study temporal genetic variation (Laikre et al. 2010a; Bruford et al. 2017; Klütsch and Laikre 2021).

The situation now appears to be changing for the better. A new Global Biodiversity Framework was recently adopted by the CBD with commitments to maintain genetic diversity within and between populations to secure adaptive potential (CBD 2022a) and an associated monitoring framework with indicators for genetic diversity (CBD 2022b). Both these outcomes were suggested and supported by scientists (Hoban et al. 2020, 2021, 2022; Laikre et al. 2020, 2021; Kershaw et al. 2022; O'Brien et al. 2022). Sweden is among the countries leading these developments by elaborating a national program to monitor genetic diversity of species selected in science-management collaboration (Johannesson and Laikre 2020; Posledovich et al. 2021; Andersson et al. 2022; Thurfjell et al. 2022). The present study represents part of such work.

Salmonid fish species have been subjected to population genetic research ever since the allozyme technique became available for large scale screenings of wild populations in the late 1970s (Allendorf et al. 1976; Ryman et al. 1979; Ryman 1981). They are of high socio-economic importance for commercial, recreational, and traditional fisheries and subject to both extensive harvest and release programs to maintain or increase fisheries (Laikre et al. 2010b). Genetic effects of human practices are confirmed by temporal genetic studies, including reduced genetic diversity following isolation due to habitat alteration (Guinand et al. 2003; Carim et al. 2016; Yamamoto et al. 2019), reduced or changed genetic composition from released or escaped farmed fish (Laikre et al. 2010b; Perrier et al. 2013, 2016), and phenotypic and/or genetic changes from selective fishing (Allendorf et al. 2008). In protected and less disturbed areas, relatively stable levels of population genetic diversity are reported (Vähä et al. 2008; Charlier et al. 2012; Palmé et al. 2013), but deviations from this pattern exist (Araguas et al. 2017).

A salmonid species subjected to relatively few temporal genetic studies is the Arctic charr (Salvelinus alpinus). This species represents the most northernly distributed salmonid (Klemetsen et al. 2003) with an adaptation to cold-water and poor tolerance to temperature change (Knudsen et al. 2016). This makes the species vulnerable to ongoing anthropogenic climate changes. For example, the range of Arctic charr in Sweden is predicted to decrease by 73% within the coming 80 years due to climate change (Hein et al. 2012). Mountain lakes are suggested to be particularly important for conservation of Arctic charr in Scandinavia because those habitats will continue to provide refugia for Arctic charr in the future (Hein et al. 2012).

Arctic charr populations are already threatened in places where severe climate changes are expected, e.g., Greenland (Sparholt 1985). However, to our knowledge, the first temporal study of genetic variation in Arctic charr found stable levels of within and between population genetic diversity in seven sampling localities in southern Greenland during a 60-year period (Christensen et al. 2018a).

Here, we monitor genetic diversity in Arctic charr in three small alpine lakes in central Sweden over a 40-year period. One of the lakes was historically only inhabited by Arctic charr but brown trout (Salmo trutta) was introduced to the system just after our first sampling. The other two lakes naturally include Arctic charr and brown trout and have not been manipulated. All three lakes and their surroundings became legally protected after the first sampling. We use whole-genome sequencing to explore whether levels of genome-wide genetic variation, inbreeding, and mutational load vary among lakes and if levels have changed over time. We apply newly adopted indicators to quantify rates of change and investigate potential indications of selective differences within and between lakes over time.

Material and methods

Samples

We used Arctic charr samples collected from three lakes: Lakes Ånnsjön, Lilla Stensjön, and Lilla Bävervattnet in mountain areas of Jämtland County, central Sweden, representing landlocked populations (Fig. 1, Table S1). Lake Ånnsjön is relatively large (57 km2), while both Lilla Stensjön and Lilla Bävervattnet are small lakes of 0.31 and 0.13 km2, respectively. They are all located in the upper parts of their respective water system c. 530–700 m above sea level. There is no migratory contact between them, although they all eventually drain into River Indalsälven that flows into the Baltic Sea. Lakes Ånnsjön and Lilla Stensjön are naturally inhabited by both Arctic charr and brown trout while Lake Lilla Bävervattnet and downstream lakes were historically void of brown trout and an artificial introduction of brown trout was conducted at one point in time in 1979 (Palm and Ryman 1999). All lakes are located in protected areas; Lakes Lilla Bävervattnet and Lilla Stensjön in Hotagen Nature Reserve (N2000 SE0720199) protected since 1993 and Lake Ånnsjön in N2000 SE0720282 protected since 2003.

Fig. 1
figure 1

Location of the three studied lakes in the mountainous area of central Sweden. Samples were collected in the 1970s and the 2010s from each lake

These lakes were chosen because they were included in one of the first large scale screenings of genetic diversity of salmonids in Sweden (Ryman et al. 1979; Ryman and Ståhl 1981; Andersson et al. 1983) and samples from the 1970s are stored in a frozen tissue bank held at the Department of Zoology at Stockholm University. Present day samples were collected in 2017 and 2018 and are also stored in the frozen tissue archive. Length, weight and sex were collected for all fish and muscle tissue was used for DNA extraction. We sequenced 10 individuals from each lake and point in time, resulting in six samples (n = 10 per sample) of whole genomes investigated resulting in a total of 60 analysed individuals (Table S1).

DNA extraction, library preparation, and sequencing

Genomic DNA was extracted from c. 50 mg muscle tissue from each of the 60 individuals using the KingFisher cell and tissue DNA kit (ThermoScientific, MA, USA) according to the manufacturer´s instructions, which included the degradation of RNA, and eluted in 100 μl elution buffer. DNA quality was assessed by electrophoresis on a 1% agarose gel by subjectively evaluating the proportion of high-molecular weight DNA relative to degraded DNA. Double-stranded DNA was quantified using a Qubit fluorometer (ThermoScientific, MA, USA) and normalised to 30–50 ng/μl.

The samples were sent to the National Genomics Infrastructure (NGI) at the Science of Life Laboratory (SciLifeLab.se), Stockholm, Sweden. NGI constructed PCR-free paired-end libraries using the Illumina TruSeq PCR-free library preparation kit with an average insert size of 350 bp followed by Illumina NovaSeq 6000 S2 sequencing using read length 150 bp. Each individual was sequenced in two lanes.

Individual whole-genome sequence data processing and variant calling

There is currently no high-quality reference genome available for Salvelinus alpinus. Therefore the sequenced reads were aligned against the Salvelinus sp. reference assembly (Christensen et al. 2018b; https://www.ncbi.nlm.nih.gov/assembly/GCF_002910315.2/#/def) using BWA mem v0.7.17 (BWA; Li and Durbin 2009), and sorted using SAMtools v1.8 (Li et al. 2009). This generated two bam files for each individual (one per lane), which were merged using SAMtools. Duplicates (probably optical ones; https://gatk.broadinstitute.org/hc/en-us/articles/360046222751-MarkDuplicates-Picard-) were marked using PICARD v2.10.3 MarkDuplicates (https://broadinstitute.github.io/picard/), resulting in approximately 0.05% of the reads marked as duplicates per individual. Final bam file statistics were checked with Qualimap v2.2.1 (García-Alcalde et al. 2012).

Individual genomic variant call format files (gVCFs) were generated with HaplotypeCaller from the Genome Analysis ToolKit (GATK) v3.8 (McKenna et al. 2010), and joint genotyping of all individuals was performed with GATK GenotypeGVCFs. After visual inspection of histograms of all filtering parameters, we applied a hard filter approach using GATK’s VariantFiltration tool to filter out low quality variants; separately for SNPs (QD < 2.0, MQ < 40.0, FS > 10.0, MQRankSum < -5.0, ReadPosRankSum < -5.0, SOR > 5.0) and indels (QD < 2.0, FS > 10.0, ReadPosRankSum < --5.0, SOR > 5.0 (DePristo et al. 2011). VCFtools v0.1.15 (Danecek et al. 2011) was used to apply additional filters: only bi-allelic single nucleotide positions with a minor allele frequency (MAF) ≥ 0.01 were retained. Further, SNPs were screened for deviation from Hardy–Weinberg equilibrium (p = 1e−06). SNPs with too low (< 1) or high (> 50) read depth in any individual were also removed as those are either not well representative or are possibly located in duplicated or repetitive regions. We used BCFtools v1.8 (Li et al. 2009) to annotate the ID field of the joint VCF file.

Sequenced reads were also aligned against the S. alpinus reference mitochondrial (mt) genome (MN957796; Oleinik et al. 2020) from the Scandinavian peninsula. Mapped reads were further processed, and variants were called and hard filtered using the same pipeline as described above.

Genomic diversity

We used VCFtools v0.1.15 (Danecek et al. 2011) to estimate heterozygosity within populations at each of the two sampling occasions. Observed heterozygosity (Ho) was estimated as the proportion of heterozygous sites, calculated by dividing the number of heterozygous sites by the total number of SNPs (across all populations) for all individuals (e.g. Jensen et al. 2021). We estimated expected heterozygosity (He) using mlRho v2.7 (Haubold et al. 2010), by estimation of the population mutation rate (θ), which approximates the per-site expected heterozygosity under the infinite sites model. First, we estimated per-individual genomic diversity from individual bam files in mlRho. We then filtered out bases and reads with quality below 30, and reads < 4 depth of coverage. Finally, He per 1,000 bp was estimated for each individual using the remaining data. Finally, nucleotide diversity was estimated in pixy v1.2.5.beta1 (Korunes and Samuk 2021) using 5 kb non-overlapping windows including invariant sites. Differences in diversity between populations were tested with one-way ANOVA and temporal change within populations with t-tests in R (R Core Team 2018).

Population divergence

Spatio-temporal population structure was assessed through pairwise FST between lakes as well as between time points within lakes (Weir and Cockerham 1984) in VCFtools for both the nuclear (across 5 kb windows) and the mitochondrial genome (per SNP). Manhattan plots of FST were created using ggplot2 (Wickham et al. 2016) in R v4.1.1 (R Core Team 2018). We also performed a principal component analysis (PCA), as implemented in PLINK v1.90b4.9 (Purcell et al. 2007). For the PCA, a ‘pruned’ SNP data was derived to avoid bias due to linkage by using the ‘indep’ function of PLINK, using window and step size 50 (e.g. Purcell et al. 2007). A variance inflation factor of 2 was used which recursively removes SNPs within sliding windows if regression coefficient (R2) > 0.5. This yielded 761,173 putatively independent loci, hereafter referred to as our SNP subset.

Dendrograms were constructed using SNPs from nuclear and mitochondrial genomes. Allele frequencies for all of the available SNPs in the six samples (three lakes and two timepoints per lake) were estimated using Stacks v2.53 (Catchen et al. 2013), and applied as input for TreeMix v1.12 (Pickrell and Pritchard, 2012). TreeMix constructs a maximum likelihood phylogeny based on allele frequencies and compares the covariance structure calculated for an estimated dendrogram to the observed covariance between populations. Migration events were not included as the lakes are isolated from each other and migration over time within the same lake violates TreeMix assumptions. We used blocks of 5,000 SNPs (total 619 blocks = 3,095,000 SNPs) as input for TreeMix for nuclear genomic data, while default settings were used for the limited number of SNPs from the mitochondrial genome.

Inbreeding

Inbreeding coefficients (FROH) were based on the fraction of the genome covered with long stretches of homozygous genotypes, so called runs of homozygosity (ROH), and their total length (LnROH; Magi et al. 2014; Gomez-Raya et al. 2015; Kardos et al. 2017). We identified ROHs using the sliding-window approach in PLINK v1.90b4.9 (Purcell et al. 2007). A set of strict parameters were employed following recommendations for similar coverage data as our 13X (Ceballos et al. 2018; see our Results Sect. "Individual whole-genome resequencing"): sliding window size of 1,000,000 (homozyg-window-snp 1,000,000) with no more than 3 heterozygous sites per window to assume a window as homozygous (homozyg-window-het 3), and defining homozygous segments as ROHs if they included ≥ 50 SNPs (homozyg-snp 25) and covered ≥ 300 kb (homozyg-kb 300). Default values were used for additional settings. We calculated FROH as the summed length of ROH divided by the total length of the autosomal genome (Kardos et al. 2018). ROHs were separated into two length classes: length ≥ 300 and < 1,000 kb, and length ≥ 1,000 kb, with FROH computed for each length class.

Mutational load

To estimate the levels of mutational load in each individual we used an in-development version of the GenErode pipeline (Kutschera et al. 2022, https://github.com/NBISweden/GenErode/). First, each samples’ raw data were mapped to the reference genome of the lake trout (Salvelinus namaycush; Smith et al. 2022), a closely related species using the ‘modern track’ of the pipeline. This includes removing sequencing adapters with Trimmomatic (Bolger et al. 2014), mapping trimmed paired-end reads with BWA mem, index and sorting the aligments with samtools, removing possible duplicates with Picard MarkDuplicates, and realigning reads around indels with GATK IndelRealigner v3.4.0 (McKenna et al. 2010). Variant calling was performed in each sample individually with Bcftools v.1.8 (Li et al. 2009) excluding reads with less than 30 mapping quality as well as sites with less than 30 base quality and less than 5 × coverage. Only biallelic variants (SNPs) segregating from the reference genome in at least one sample were kept. SNPs allocated in repetitive regions of the reference genome as identified by Repeatmasker v 2.0.1 (Smit and Hubley 2015) and Repeatmodeler v 4.0.9 (Smit et al. 2017) were also excluded. Finally, we excluded all heterozygote sites for which the ratio of alternative/reference supporting reads was lower than 20% or higher than 80%. The remaining SNPs were annotated to infer all variant effects using SNPeff v.4.3 (Cingolani 2022). For each sample, we report the number of high impact variants (loss of function, LOF) normalized by the number of sites analyzed for that sample, where homozygote variants are counted twice and heterozygote variants once.

Effective population size

Two approaches were used to estimate the population effective size (Ne) of each lake, both relying on the SNP subset used in PCA-analysis. First, GONE (Santiago et al. 2020) was employed to estimate linkage disequilibrium effective population sizes (NeLD) for each point in time for each lake. GONE uses a computational framework to infer recent population history from the observed spectrum of linkage disequilibrium (Santiago et al. 2020). Here, the SNP subset was phased using Beagle v5.1 (Browning and Browning 2007) and then converted into a GONE input file using PLINK v1.90b4.9 (Purcell et al. 2007). Default settings were used in GONE. As the generation length of Arctic charr is unknown to us, we assumed a generation time of 7 years as estimated for brown trout in this area (Charlier et al. 2012; Palmé et al. 2013), which, based on the number of years between sampling occasions for separate lakes, translates into a 5–6 generation interval over which Ne is estimated (cf. Table S1). We report NeLD from the most recent generation (i.e., one generation before present) and from 6 generations ago.

Second, we used TempoFs (Jorde and Ryman 2007) to estimate variance effective size (Nev) in each of the studied lakes. TempoFs quantifies allele frequency shifts between temporally separated samples using the Fs measure (Jorde and Ryman 2007). TempoFs can run for a maximum of 200,000 loci, so the SNP subset was further filtered for MAF ≥ 0.05 in all samples before 200,000 SNPs were randomly selected using the ‘thin’ function in PLINK. TempoFs provides confidence intervals for the point estimates while GONE does not.

Indicators for genetic diversity

The CBD monitoring framework for the new Global Biodiversity Framework (GBF) applies Ne > 500 as a so-called headline indicator which all nations need to report on (CBD 2022b). Ne is also a main indicator in the Swedish national program which applies three indicators (Andersson et al. 2022) all using genetic measurements referred to as Essential Biodiversity Variables (EBVs) for genetic diversity (Hoban et al. 2022). We used estimates of observed and expected heterozygosity, nucleotide diversity, and inbreeding detailed above from past and present samples to calculate potential changes in these metrics over time in each lake, quantified as ΔHo, ΔHe, Δπ, ΔFROH for the within population genetic diversity indicator which is called ΔH (Andersson et al. 2022). For the Ne-indicator we used estimates of effective population size from TempoFs (NeV) and the GONE analysis (NeLD; see above).

The Swedish national monitoring program also uses an indicator to quantify change in between population genetic diversity potentially caused by decrease or increase in gene flow (ΔFST; Andersson et al. 2022). Since the three lakes assessed here are isolated from each other, this indicator is not relevant, however, we still provide the metrics for documentation; FST was estimated between each pair of populations at both points in time in order to estimate ΔFST. The CBD indicator for the between population genetic diversity component is “the proportion of populations maintained”—a so-called Complementary indicator in the CBD GBF monitoring framework (CBD 2022b). In the present case, that indicator reflects whether Arctic charr remain in all of the three lakes studied here.

We apply the methods and threshold values presented by Andersson et al. (2022) for assessing the indicators. In brief, three indicator signals (i.e., “Alarm”/ “Warning”/ “Acceptable”) were applied for each diversity metric and indicator (see Appendix S1 for details).

Genome scans of selection

We explored indications of adaptive divergence between lakes and selection acting within lakes over the evolutionary short, contemporary time period monitored here.

Local adaption

First, we identified signatures of divergent allele frequencies between lakes, indicative of local adaptation. A modified version of POWSIM (Ryman and Palm 2006) was used to investigate whether the distribution of FST values in our data was consistent with the expectation for selectively neutral loci, and where deviations from such a distribution could reflect directional selection (cf. Lamichhaney et al. 2012; Saha et al. 2022). We compared the observed genome-wide distribution of FST between two lakes with a simulated “expected” distribution under drift only. The observed distribution was based on per SNP estimates. We used a conservative approach, defining candidates of selection as SNPs with an observed FST above the largest value of the expected distribution (cf. Saha et al. 2022). We performed this analysis for each pairwise comparison of lakes and for each time point separately.

Selection over time within lakes

We used RAiSD v2.9 (Alachiotis and Pavlidis 2018) to scan for directional selection acting within each lake over time. RAiSD identifies selective sweeps by searching for distinct patterns expected to surround loci subject to positive selection; localized reduction of polymorphisms, shifts in allele frequencies, and a specific pattern of linkage disequilibrium expected around mutations (Smith and Haigh 1974; Braverman et al. 1995; Kim and Nielsen 2004). RAiSD uses these three sweep signatures to score genomic regions in a μ-statistic using a sliding-window algorithm. We used default settings with a sliding window size of 50 SNPs for each sample of lake and point in time. Outliers were defined as SNPs with μ values above a threshold corresponding to the 99.95th percentile, as suggested by Alachiotis and Pavlidis (2018). We then focused on sweep signals identified in present (2010s) samples that were lacking from past (1970s) ones for each lake in order to identify very recent selective sweeps that may have occurred during our monitoring period, hereafter referred to as “temporal sweeps”.

Gene ontology

In order to assess the biological meaning of SNPs putatively under selection, a gene ontology (GO) set enrichment analysis (GSEA) was performed to associate these SNPs to common gene functions. First, functional annotation of the Salvelinus sp. reference was performed on the EggNOG v5.0 web-interface (http://eggnog-mapper.embl.de/; Huerta-Cepas et al. 2019) using Salvelinus sp. protein FASTA files from Ensembl (GenBank accession GCA_002910315.2). Second, we converted the NCBI Salvelinus sp. annotation release 101 from GFF to bed format using bedtools v2.27.1 (Quinlan and Hall 2010), only including coding regions, and extracted all genes that overlapped with the identified putatively selective SNPs using bedtools intersect (Quinlan and Hall 2010). Third, the R package topGO (Alexa et al. 2016) was used to test for over-representation of GO biological processes using a node size of 5 and the ‘classic’ algorithm to account for structural relationships among GO terms. Finally, GO terms with p ≤ 0.01 were retained and then filtered in REVIGO (Supek et al. 2011) to remove redundant GO terms. Treemaps were drawn in R v3.6.3 to visualize enriched GO superclusters. We also investigated the functional annotations of our putatively selective SNPs by annotating our VCF file in SnpEff v.5.0 (Cingolani et al. 2012).

Results

Individual whole-genome resequencing

Over 1,350 million reads were generated, corresponding to between 13 and 28 Giga base pairs (Gb) of raw data per individual (Table S2). Approximately 97% of the raw data per individual mapped to the Salvelinus sp. reference genome. An average mapping quality of 37 and depth of coverage c. 13X was observed (Table S2). After quality filtering, we retained 3,421,773 and 77 bi-allelic SNPs in the nuclear and mitochondrial genomes, respectively.

Genomic diversity

We observed significant difference in genome wide diversity (π, Ho, and He) among lakes for past and present samples respectively (Table 1). All three diversity measures were highest in Lake Ånnsjön and lowest in Lilla Stensjön (Fig. 2a, Fig. S1).

Table 1 Genomic diversity from individual whole-genome sequencing data of Arctic charr from three lakes collected in the 1970s and 2010s (Fig. 1, Table S1). Diversity metrics are averaged per population, including: nucleotide diversity (π), observed heterozygosity (Ho), expected heterozygosity (He), and inbreeding (FROH). Values of 95% confidence intervals are provided in brackets. Results from significance testing of homogeneities among values from different lakes at each point in time is provided in the last two rows. Significant p-values are in bold
Fig. 2
figure 2

Diversity within Arctic charr populations from three lakes and two time points; a Nucleotide diversity (π) within populations over time. Boxplots depict genome-wide π per lake and sampling occasion. b Inbreeding (FROH). Small segments of FROH in cyan at the bottom of each bar imply FROH estimated from ROH ≥ 1 Mb, while the upper segments are derived from ROH < 1 Mb. Error bars represent values within 95% confidence intervals. Significant temporal difference (p < 0.05, t-test) within lakes are shown with asterisks (*). c Per sample levels of mutational load measured as number of LOF variants per 100,000 sites (100 k). In each sample (lake and point in time), boxplots represent minimum, maximum, median, first quartile and third quartile values. All individual values are represented with black dots.

Inbreeding (total FROH) differed significantly among lakes for both past (ANOVA; F = 121.50, p = 3.20 × 10–14 and present samples (ANOVA; F = 71.51, p = 1.60 × 10–11); Table 1, Fig. 2b). Inbreeding was highest within Lake Lilla Stensjön and lowest in Ånnsjön. Short ROH accounted for the major proportion of the total FROH in all samples (Table S3). At most, 37 (0.53%) long ROH segments (of length ≥ 1 Mb) were found in fish from Lake Lilla Stensjön from the 2010s (Table S3). In genomes from Lake Ånnsjön from the 2010s, no ROH exceeded 1 Mb (Table S3).

Estimates of effective population sizes using linkage disequilibrium (NeLD) and variance Ne (NeV) were not concordant (Table 2). Current NeLD was 229, 378, and 64, while NeV was 60, infinity, and 310 for lakes Lilla Bävervattnet, Lilla Stensjön, and Ånnsjön, respectively. Significant allele frequency change across the nuclear genome was observed within all three lakes, with average FST between time points ranging from 0.019 in Lake Lilla Stensjön, to 0.046 within Lake Lilla Bävervattnet (Table 3, Fig. 3). Temporal FST estimates based on SNPs from the mitochondrial genome did however not significantly deviate from zero in any comparison and ranged between 0.004 and 0.018 (Table S4).

Table 2 Estimates of effective population size (Ne) of Arctic charr from three lakes. Ne per lake is estimated with the temporal method (NeV) in TempoFs (Jorde and Ryman 2007), and 95% confidence intervals are provided in brackets. Ne based on patterns of linkage disequilibrium (NeLD) are provided per lake and time point and were estimated in GONE (Santiago et al. 2021)
Table 3 Pairwise, FST (Weir and Cockerham 1984) between Arctic charr from three lakes collected in the 1970s and 2010s across nuclear genomes (window size = 5,000 bp). FST-values from temporal comparisons within lakes are marked in bold. Values of 95% confidence intervals are shown in brackets. See Table S4 for FST-estimates across mitochondrial genomes
Fig. 3
figure 3

Allele frequency change (FST) over time within each lake. FST was estimated by comparing past and present samples from each of the three lakes. Each dot represents a 5 kb window

Population divergence

Samples were separated into three distinct clusters corresponding to the three lakes in the dendrogram (Fig. 4) and PCA (Fig. S2). Genome-wide, nuclear, divergence did not reflect geographic distance (Table 3, Fig. 4). Largest divergence was estimated between the neighbouring Lakes Lilla Bävervattnet and Lilla Stensjön, with mean genome wide FST = 0.22, both for 1970s and 2010s samples (Table 3). Lowest divergence was found between Lakes Ånnsjön and Lilla Bävervattnet (mean FST = 0.16 for both points in time). Generally, FST ranged between 0 and 1 for all pairwise comparisons between lakes as well as within lakes over time.

Fig. 4
figure 4

Genetic relationships among lakes for each sampling occasion. The dendrogram was made in TreeMix assuming no migration between lakes and (n = 3,421,773 SNPs) SNPs. The scale indicates the proportion of genetic divergence per unit length of the branch (indicated by the scale length)

The dendrogram based on mitochondrial SNPs largely agreed with that of the nuclear ones (Fig. S3), suggesting largest divergence between Lakes Lilla Bävervattnet and Lilla Stensjön. However, in contrast to the nuclear genomes, lowest divergence across mitochondrial genomes existed between Lakes Lilla Stensjön and Ånnsjön, irrespective of point in time (mean FST = 0.15; Table 3).

Mutational load

We estimated an average of 9.2 LOF variants per 100,000 sites across samples. We find no change in mutational load estimates between the 1970s and 2010s samples in any of the lakes (Fig. 2c). However, both 1970s and 2010s samples from Lake Ånnsjön displayed lower mutational load than the samples from the other two lakes. We then compared the mutational load levels per sample in either homozygous or heterozygous state (Fig. S4). We find that homozygous LOF variants, also known as realized load, are the ones driving the patterns observed above, while LOF variants in heterozygous state, masked load, show no differences between lakes or time-points, except between the 1970s and 2010s samples in Lake Lilla Bävervattnet (TukeyHSD, p = 0.047; Fig. S4).

Genetic indicators

Genetic diversity within populations as assessed by the ΔH-indicator, has remained at “Acceptable” levels within all three lakes over the current sampling period (Fig. 5, Table S5). Of the three lakes, significant temporal change was only observed in Lake Ånnsjön, with increased Ho and π (t-test; H; t (9) = 3.90, p = 0.0036, π; t (9) = 2.32, p = 0.036; Fig. 2, 3, Table S5). No significant change in He was observed over time in any lake (p > 0.05; Table S5). Furthermore, FROH decreased significantly over time within Lake Ånnsjön (t-test; t (18) = 2.43, p = 0.026) and remained stable within the other two lakes (Lilla Stensjön; t (18) = 0.44, p = 0.067, Lilla Bävervattnet; t (18) = 1.94, p = 0.068).

Fig. 5
figure 5

Genetic indicator classifications for Arctic charr. a within population diversity and b between population diversity. The colored circles indicate the following classifications; green = “acceptable”, yellow = “warning”, and red = “alarm”. Arrows inside circles show the direction of change, with horizontal arrows meaning apparent stability (no change); filled arrows indicate that the change is statistically significant (p < 0.05)

We used the NeV values for the Ne-indicator since these generally exceed those from NeLD and using largest values is recommended for populations that are not isolated (Ryman et al. 2019). Lilla Stensjön was the only lake classified as “Acceptable” (NeV > 500), while Lilla Bävervattnet and Ånnsjön were classified as “Warning” (NeV < 500; Fig. 5). The FST-indicator was classified as “Acceptable”; all populations have remained and temporal changes in FST between populations were within acceptable thresholds (Table S5, Fig. 5).

Local adaptation

SNPs with excessive divergence between populations in separate lakes identified by POWSIM were marked as candidates of local adaption to conditions at the site of individual lakes. We found approximately 5,000 such SNPs when comparing Lake Ånnsjön to either of Lilla Bävervattnet and Lilla Stensjön (Table S6). The number of SNPs found in these two comparisons decreased over time. No SNP candidates were found between Lakes Lilla Bävervattnet and Lilla Stensjön (Table S6).

With regard to functional impact, most of the SNP candidates for local adaptation are modifiers (Table S6). Six SNPs of high impact were identified, found in comparisons of Lakes Lilla Stensjön and Ånnsjön, three from each sampling occasion (1970s and 2010s). All but one of these were found within splice sites, and the sixth SNP within a frame-shift mutation. None of the 19 surrounding genes are to our knowledge described in bony fish. Assignment of significantly enriched GO terms of the SNPs candidates into higher order biological processes suggested those were primarily associated with immune, metabolic, developmental, and reproductive activities (data not shown).

Of all the SNP candidates of local adaptation, six were found in both comparisons of Lake Ånnsjön to either of Lilla Bävervattnet and Lilla Stensjön (Table S7). Most of these are of moderate or low impact and fall within introns or intergenic regions. One of the five genes in which the SNPs were found, loc111962718, has to do with salinity tolerance in Arctic charr and Atlantic salmon (Norman et al. 2012).

Temporal selection within lakes

RAiSD identified signals of selective sweeps acting in several regions of the Arctic charr genome. We focused on putative “temporal sweeps”, identified as putative sweeps found in present samples that were absent from past samples within each lake. Many of the temporal sweep signals identified within Lake Ånnsjön were also found within Lilla Stensjön, and vice versa (not shown). In contrast, two temporal sweeps were unique to Lake Lilla Bävervattnet and not found in the other two lakes; one such sweep may be the result of positive selection, as it was present among 2010 samples but absent among those from the 1970s. This putative sweep was identified on linkage group 31 (scaffold NC_036870.1, Fig. 6a) and also contained SNPs identified by POWSIM as candidates of local adaptation between Lakes Lilla Bävervattnet and Ånnsjön (none were found between Lakes Lilla Bävervattnet and Lilla Stensjön). A majority of these candidate SNPs were of moderate impact and most fall within intergenic regions. Two genes were identified within the sweep that are described in ray–finned fish, both of which have to do with immunity: loc111955512 and fox06 (Li et al. 2007; Russell et al. 2008; Grammes et al. 2013; Table S8).

Fig. 6
figure 6

Two putative selective sweeps within Lake Lilla Bävervattnet that have occurred over the sampling period. RAiSD (Alachiotis and Pavlidis 2018) was used to identify sweeps unique to this lake by scanning the genomes of a present and b past samples. Candidates for selective sweeps (in blue) were taken from the top 99.95% (dashed line) of selective sweeps. Temporal sweeps (red) are outlier sweeps only found in a present samples, on linkage group 31 (scaffold NC_036870.1) and b only in past samples, linkage group 7 (scaffold NC_036847.1)

The second temporal sweep unique to Lake Bävervattnet was present among 1970 samples and found on linkage group 7 (scaffold NC_036847.1; Fig. 6b) but is not present in the 2010 sample. Negative selection may have shaped this region, as the signal is absent among 2010 samples. Here too, SNP candidates of local adaptation between Lakes Lilla Bävervattnet and Ånnsjön from POWSIM are found, although none of the overlapping genes are, to our knowledge, described in fish (Table S9).

Discussion

We have monitored genomic diversity in Arctic charr in three landlocked mountain lakes in central Sweden over a 40-year period (c. 6 generations) and find stable levels of genomic diversity but small effective population sizes. Furthermore, selective pressures may be at play across varied parts of the Arctic charr genome. Different selective pressures between lakes are suggested within a gene regulating salinity tolerance. Most prominently, signatures of selective sweeps are suggested in fish from Lake Lilla Bävervattnet where the competitive species brown trout was introduced in the 1970s. Identified genes are potentially involved in immunity

Genetic diversity within lakes

Genomic diversity measured with the three metrics appointed here (Ho, He, π) is highest within the largest lake, Lake Ånnsjön, while considerably lower in the two small lakes with lowest values observed in Lake Lilla Stensjön (Table 1; Table S1). This pattern is consistent over time and correlates with observed levels of inbreeding with lowest FROH in the large lake (Ånnsjön) and highest in Lilla Stensjön. Further, genetic load is higher in the small lakes (Fig. 2c) and this is due to higher frequency of loss of function variants in homozygous state in the small lakes while masked load levels are similar in all three lakes (Fig. S4).

Genetic diversity within presently studied lakes appears similar or somewhat higher compared to measures reported for other landlocked Arctic charr populations studied with SNPs. Christensen et al. (2018a) include one landlocked population in their study of Arctic charr in western Greenland which exhibits He = 0.173, comparing to ours ranging 0.217–0.291 (Table 1). In ten landlocked populations in eastern Canada, He varies from 0.05 to just over 0.2 (Salisbury et al. 2023). Both those studies also include anadromous populations reported to display higher levels of variation with He spanning c. 0.22–0.30 (Christensen et al 2018a; Salisbury et al. 2023) similar to our estimates (Table 1). Shikano et al. (2015) studied landlocked populations in Finland inhabiting lakes of size 0.2–4 km2, which are comparable to the sizes of Lakes Lilla Bävervattnet and Lilla Stensjön (Table S1). They use microsatellites and find considerably higher He spanning 0.359–0.674.

Our Ne estimates appear comparable to those observed in other Arctic charr populations. Shikano et al. (2015) estimate current Ne (linkage disequilibrium method; Waples and Do 2008) in the range 7–228 in their six landlocked populations in Finland. While the landlocked populations studied in eastern Canada show NeLD in the range of c. 10–250 (Salisbury et al. 2023; their Fig. 4), our current NeLD estimates are 64, 378, and 229 for Lakes Ånnsjön, Lilla Stensjön, and Lilla Bävervattnet, respectively (Table 2). Similarly, four anadromous populations in western Greenland show Ne of 179–685 (Christensen et al. 2018a) using the temporal MLNe method (Wang and Whitlock 2003). Our temporal estimates from TempoFs (Jorde and Ryman 2007) give similar estimates of 60-∞ (Table 2), but we see some discrepancies between NeV and NeLD (Table 2). The NeV estimates reflect allele frequency shifts over the 40-year period monitored. The largest shifts are observed in Lake Lilla Bävervattnet (Table 3, Fig. 3, 4) reflected in the smallest NeV in that lake, while the smallest allele frequency change is observed in Lilla Stensjön which results in an infinite NeV. In contrast, Lake Ånnsjön shows the smallest NeLD values. NeLD relates to linkage disequilibrium which increases with small population size but can also be a result of sampling of multiple populations. Since Lake Ånnsjön is the largest lake it is not unlikely that multiple populations exist in that lake, which in turn can affect the NeLD estimates. Both NeV and NeLD are affected by migration but in different ways, which is another potential reason for why the Ne values from the two approaches differ (Ryman et al. 2019).

Clearly, Ne values in Arctic charr are typically below the threshold of Ne ≥ 500 recommended for maintaining adaptive capacity of populations (Franklin 1980; Franklin and Frankham 1998; Hoban et al. 2020). However, all three studied lakes are connected to other water bodies and gene flow most likely occurs, which is supported by our observations of increased genetic diversity in two lakes and stable levels in the third. A limitation of our study is that the genetic structure over larger areas has not been conducted. Such work is needed in the future to fully understand the genetic dynamics in these systems.

Genetic change within populations over time

We find significant allele frequency shifts over time in Lakes Ånnsjön and even more pronounced in Lilla Bävervattnet with within lake FST-values of 0.03 and 0.05, respectively (Table 3). These shifts exceed those observed by Christensen et al. (2018a) of c. 0.01. In our lakes these shifts do not appear to be associated with a loss of genetic diversity, however. Lake Ånnsjön shows significant temporal increase in all diversity measures and reduction in inbreeding while Lilla Bävervattnet shows slight increase or stable levels of variation (Fig. 2, 5). These observations in Lilla Bävervattnet are particularly interesting in light of the establishment of a competitive species – the brown trout – during the 40-year monitoring period.

Prior to the release event in 1979 Arctic charr was the only fish species in Lilla Bäve vattnet and downstream lakes and the brown trout established and spread rapidly in the system (Palm and Ryman 1999; Kurland et al. 2022). Our present results indicate that as yet we cannot find any obvious negative effects of this release on overall genome-wide diversity in the Arctic charr. However, the pronounced temporal allele frequency shifts, indications of a selective sweep (discussed below), and the significant increase of LOF variants in the heterozygous state (Fig. S4b) does indicate that the release has impacted the genomic dynamics of the charr. The present sampling is too limited spatio-temporally as well as numerically to delineate the details of these perturbations. In contrast, the study of Arctic charr populations inhabiting the frigid waters of western Greenland reports lower levels of heterozygosity in all present-day samples compared to those from 1950s for the four populations where such temporal information is available (Christensen et al. 2018a, their Table 1).

Genetic diversity between lakes

The genome wide divergence between the three lakes studied here appears to be temporally stable, with FST = 0.16–0.21 in the 1970s and FST = 0.16–0.22 in the 2010s (Table 3). These values lie within ranges observed between landlocked populations of Arctic charr in Finland, where pairwise FST ranges from 0.1 to 0.4 (Shikano et al. 2015). Stability in genetic diversity over time is also observed among populations from Greenland (Christensen et al. 2018a).

Local adaptation

Salmonids exhibit local adaptation over small geographic scales (e.g. Fraser et al. 2011; Debes et al. 2017). We scanned the genome for SNPs exhibiting divergence exceeding expectations under drift in POWSIM, suggesting local adaptation in charr of separate lakes. We found most putatively adaptive SNPs when comparing Lake Ånnsjön to either of Lilla Bävervattnet and Lilla Stensjön (c. 5,000; Table S6). No putatively adaptive SNPs were identified between neighbouring lakes Lilla Bävervattnet and Lilla Stensjön, which is particularly striking given that genome-wide FST between these two lakes was the highest of all pair-wise comparisons. High overall FST between Lakes Lilla Bävervattnet and Lilla Stensjön likely reflects drift acting within the lakes, which are small, and where population numbers are likely limited (Table S1). Indeed, our measures of local effective sizes are small (< 500; Table 2). Selection may only be strong enough to override drift in the largest lake, Lake Ånnsjön. However, selective sweep signals are identified within the two smaller lakes. Therefore, it appears more likely that the lack of putatively adaptive SNPs found within Lakes Lilla Stensjön and Lilla Bävervattnet reflect difficulty to disentangle selection from drift in the present kind of system.

Alternatively, identifying most putatively adaptive SNPs between Lake Ånnsjön to either of Lilla Bävervattnet and Lilla Stensjön suggests that the selective forces in Lake Ånnsjön differ from the other two lakes. Six of the SNPs marked as putatively under selection by POWSIM were found in both comparisons of Lake Ånnsjön to Lakes Lilla Bävervattnet and Lilla Stensjön (Table S7). These SNPs may reflect local conditions within Lake Ånnsjön that are unique in comparison to the other two lakes. One of the SNPs is found within gene loc111962718 which has to do with salinity tolerance in Arctic charr and Atlantic salmon (Norman et al. 2012), which is somewhat surprising to find among landlocked populations. However, osmoregulatory performance in Atlantic salmon is linked to both salinity and temperature (Vargas-Chacoff et al. 2018). It is conceivable that environmental conditions within Lakes Lilla Bävervattnet and Stensjön are similar given their proximity and similarity, which would also explain the lack of putatively adaptive SNPs found between them. Limited environmental data supports this contention, where conductivity within both of Lakes Lilla Bävervattnet and Lilla Stensjön averages 1 mS/m25 and measures 3 mS/M25 within Lake Ånnsjön, and pH is approximately 6 in both Lakes Lilla Bävervattnet and Lilla Stensjön and 7 in Lake Ånnsjön. Further study of the environmental factors possibly driving local adaptation in present lakes is planned, e.g., attempts to associate biotic or abiotic forces to genomic variation.

One of the SNP candidates of adaptive divergence between Lakes Lilla Stensjön and Ånnsjön lies within a frame-shift mutation and the evolutionary importance of structural variants gains increasing focus (Wellenreuther and Bernatchez 2018). Such variants may reach high frequencies or fixation over few generations (Paudel et al. 2015). Structural variation is also prevalent among polyploid salmonids (Brenna-Hansen et al. 2012; Lien et al. 2016). For instance, chromosomal rearrangements play a role in salinity tolerance and migratory timing in Atlantic salmon (Wellband et al. 2019; Watson et al. 2022). Further study of structural variants in present populations is thus warranted. Additionally, most of our putatively adaptive SNPs are found outside of gene transcripts, yet the selective importance of non-coding DNA is gaining increasing attention (e.g. Hill et al. 2021).

Selection over time within lakes

We find signs of selective sweeps having occurred over the current sampling period in each of the studied lakes. Most strikingly, we observe a sweep unique to present samples from Lilla Bävervattnet. We cannot completely rule out genetic drift; effective population size for this lake is below 500 (Table 2) and temporal allele frequency change within this lake averages FST = 0.045, which is the highest of the presently studied lakes (Table 3, Fig. 3). Selective forces acting within this lake would likely need to be strong to counteract drift. However, brown trout were introduced to this lake during the sampling period and these two species are expected to compete for resources (Nilsson 1965; Forseth et al. 2003). It is therefore possible that the introduction of brown trout to Lake Lilla Bävervattnet has placed such strong selective pressures that the detected selective sweeps are indeed true. Rapid adaptation over a few generations has been demonstrated in response to diverse selective forces e.g., fishing (Therkildsen et al. 2019), urbanization (Kern and Langerhans 2018), population translocation (Willoughby et al. 2018), and ocean acidification (Bitter et al. 2019). In Darwin finches (Geospiza spp.), shifts in alleles affecting beak size were observed after only one generation following severe drought (Enbody et al. 2023).

Genetic indicators and management implications

We apply three indicators adopted by the Swedish Agency for Marine and Water Management for national monitoring of genetic diversity in aquatic species (Andersson et al. 2022; Kurland et al. 2023; Fig. 5). They have also recently been applied by the Swedish Environmental Protection Agency to a terrestrial species – the moose (Dussex et al. 2023). One of these three indicators, the Ne indicator, is a Headline indicator in the monitoring framework of the new CBD Global Biodiversity Framework (CBD 2022b; Hoban et al. 2023).

Most of our indicator Ne values are below the threshold 500 (Fig. 5, Table 2, S5). However, the within population genetic diversity indicator (ΔH) does not show any warning signals which support the contention that these populations are not isolated but maintain genetic diversity via migration.

The third indicator, ΔFST, focuses on genetic diversity between populations. It measures retention of populations as well as the level of genetic diversity between interconnected populations (quantifying potential change in FST among populations over time; Andersson et al. 2022). In our present case, the three lakes are located at such distances from each other (Fig. 1) that migration between them is not possible. Therefore, there is no biological meaning in measuring changes in FST between them. We have nevertheless done so and find that the changes of FST are within applied threshold limits (Table S5; Andersson et al. 2022). Further, since Arctic charr remains in all three lakes over the monitoring period, we find no indications of loss of populations. Monitoring the proportion of populations maintained within a species over a particular geographic area is also an indicator in the CBD monitoring framework. It is currently listed as a Complementary indicator for countries to report on (CBD 2022b; Hoban et al. 2023).

In the present case, indicators taken together suggest that maintaining connectivity in the systems which the monitored populations belong to is crucial because under isolation, genetic diversity is expected to erode rapidly due to small Ne. Such connectivity is secured at present, since all our lakes are located in protected areas.

Our finding highlights the benefit of WGS for sustainable management and conservation, as recently highlighted by Theissinger et al. (2023). The main contribution from WGS in comparison to traditional markers is the potential to assess the patterns of diversity over the entire genome as well as adaptive spatio-temporal divergence, inbreeding and genetic load (cf. Allendorf 2017).