Atlantic salmon populations reveal adaptive divergence of immune related genes - a duplicated genome under selection

Populations of Atlantic salmon display highly significant genetic differences with unresolved molecular basis. These differences may result from separate postglacial colonization patterns, diversifying natural selection and adaptation, or a combination. Adaptation could be influenced or even facilitated by the recent whole genome duplication in the salmonid lineage which resulted in a partly tetraploid species with duplicated genes and regions. In order to elucidate the genes and genomic regions underlying the genetic differences, we conducted a genome wide association study using whole genome resequencing data from eight populations from Northern and Southern Norway. From a total of ~4.5 million sequencing-derived SNPs, more than 10 % showed significant differentiation between populations from these two regions and ten selective sweeps on chromosomes 5, 10, 11, 13–15, 21, 24 and 25 were identified. These comprised 59 genes, of which 15 had one or more differentiated missense mutation. Our analysis showed that most sweeps have paralogous regions in the partially tetraploid genome, each lacking the high number of significant SNPs found in the sweeps. The most significant sweep was found on Chr 25 and carried several missense mutations in the antiviral mx genes, suggesting that these populations have experienced differing viral pressures. Interestingly the second most significant sweep, found on Chr 5, contains two genes involved in the NF-KB pathway (nkap and nkrf), which is also a known pathogen target that controls a large number of processes in animals. Our results show that natural selection acting on immune related genes has contributed to genetic divergence between salmon populations in Norway. The differences between populations may have been facilitated by the plasticity of the salmon genome. The observed signatures of selection in duplicated genomic regions suggest that the recently duplicated genome has provided raw material for evolutionary adaptation.


Background
In addition to being one of the most highly prized freshwater fish for recreational fishing, the Atlantic salmon (Salmo salar L.) is one of the most economically important aquaculture species worldwide. Its natural distribution is throughout the North Atlantic, ranging from Long Island Sound to Ungava Bay in the west and from Northern Portugal to the Barents Sea in the east [1]. This distribution is the result of postglacial colonization of ecosystems that became available when the glacial ice retreated about 10,000 years ago [2].
Atlantic salmon is characterised by highly significant, hierarchically structured population genetic divergence, with the largest differences observed between the European and North American lineages [3][4][5]. This divergence is also observed on a regional scale, presumably as a consequence of the colonization process associated with the retreat of the glacier [6,7]. Moreover, local scale differentiation exists, for example between neighbouring rivers [8][9][10] and among tributaries within the same river which might be explained by restricted gene flow, genetic drift and adaptation [11][12][13].
Atlantic salmon exhibit a relatively complex life history that includes spawning and juvenile rearing in freshwater followed by extended ocean migrations to the feeding grounds [14]. As a consequence, salmon go through several distinct transitions that are characterized by changes in behaviour and physiology [15]. They are also able to adapt to varying local conditions throughout their range of environments [16], exemplified by their ability to inhabit rivers with a wide range of temperatures, from Spain to the colder Arctic latitudes [17]. Previous studies have shown differences in temperature and climate to be associated with genetic differences between salmon populations [7,18], and latitude also seems to be correlated with allele frequencies of markers relevant to immune response in American and European Atlantic salmon populations, possibly due to temperature induced differences in pathogen-driven selection or other environmental factors [19][20][21].
In the wild, Atlantic salmon are constantly confronted with a range of pathogens, and have consequently developed numerous innate and adaptive immune mechanisms to overcome infectious challenges [22]. Recent studies suggest that the prevalence of parasites and infectious diseases is increasing in wild populations partly due to global warming [23,24]. Given the commercial relevance of Atlantic salmon and the recent release of a reference genome [25], particular effort should be made to identify genes targeted by natural selection in wild Atlantic salmon populations that ultimately can lead to optimized aquaculture practices. The potential relevance of these findings for the Atlantic salmon farming industry is exemplified by the identification of Infectious Pancreatic Necrosis (IPN) Virus resistance [26] and age at maturity associated genes [27,28]. A relatively recent whole genome duplication occurred in the salmonid lineage some 80 million years ago [29], resulting in a partly tetraploid genome undergoing rediploidization. Consequently the genome contains many paralogous regions that could provide raw material for evolution as paralogous genes and regions can diversify and acquire new functions [30].
Based upon the analysis of microsatellite and SNP markers, several studies have demonstrated that there are highly significant genetic differences between Atlantic salmon populations located in the north and south of Norway [31][32][33]. However, the genomic regions and genes behind the differences have not been investigated in detail, and consequently, the potential adaptive significance of this genetic divergence remains elusive.
Recently, a genome wide association study (GWAS) based upon whole genome resequencing data revealed a selective sweep in Atlantic salmon strongly associated with age of maturation [27]. Using a similar methodological approach, the present study aimed to identify genes and genomic regions diverging between Atlantic salmon populations in the north and south of Norway. In order to achieve this objective, salmon populations inhabiting the four rivers Tanaelva, Lakselv, Altaelva and Reisaelva from Northern Norway and the four rivers Gloppenelva, Eidselva, Suldalslågen and Årdalselva from Southern Norway were chosen for resequencing using DNA pools (n = 30 fish per river, Fig. 1). The major finding in this study was the observation that diversifying natural selection has acted on immune related genes causing adaptive divergence between populations in the north and south of Norway.

Results and discussion
Whole genome sequence data from eight selected rivers along the Norwegian coast ( Fig. 1) was mapped to the most recent Atlantic salmon reference genome (AKGD00000000.4). This yielded a 26.7× average depth of coverage of uniquely mapped reads per river. SNP calling revealed 4,450,990 high quality SNPs. To quantify the genetic difference between populations of the chosen rivers, Hudson's estimator for Wrigth's fixation index (F ST ) [34] was calculated (Additional file 1: Table S1). A phylogenetic tree was made using this distance matrix to illustrate and confirm the reported large genetic difference between the northern and southern populations of Atlantic salmon in Norway (Fig. 1). Statistical analysis using the Cochran-Mantel-Haenszel test for different allele frequencies between northern and southern Atlantic salmon populations revealed 474,410 SNPs with significantly different allele frequencies (0.1 % FDR, Fig. 2a). Genomic regions subjected to recent positive selection are expected to have lower heterozygosity than other regions, and if the selective pressure differs between populations, higher F ST is observed [35]. An approach calculating F ST and heterozygosity in 50 kb sliding windows has previously been used to identify genomic regions under selection (selective sweeps) [36]. This method was used to find selective sweeps which differ between northern and southern salmon populations in Norway (Additional file 1: Figure S1). The combined F ST / heterozygosity approach suggested 10 selective sweeps that differed between the two geographical regions. The sweeps ranged from 75,000 to 575,000 bp in size, and were found in chromosomes 5, 10, 11, 13-15, 21, 24 and 25 (Table 1). These sweeps contained in total 59 genes involved in a number of different biological processes including cell division, cytokinesis, angiogenesis, development, transcriptional regulation and immune response. For a detailed list of gene ID and short description of function see Additional file 1: Table S2.
The high number of SNPs and genes in the selective sweeps complicates the task of pin-pointing the most important genetic differences. Therefore, we focused on missense mutations that induce amino acid changes in proteins, since these are more likely to confer a difference in biological function. Within the identified sweeps 20 significantly differentiated missense SNPs were found, comprising 15 different genes dispersed in 6 selective sweeps (Table 2). Three missense mutations were observed in the sweep on Chr 10, all in a single gene, anln, encoding an actin-binding protein required for cytokinesis. Three genes on Chr 13 harbor missense mutations: trpc2, involved in chemosensory transduction, and interestingly knockout mice display changes in their sexual, aggressive, and parenting behaviors [37]; rrm1, an enzyme essential for the production of deoxyribonucleotides; rb1, which promotes G0-G1 transition when phosphorylated by CDK3/cyclin-C acts as a transcriptional repressor of E2F1 target genes. Also in the Chr 14 sweep there are three genes with missense mutations: adnp, a homeodomain containing DNA binding transcription factor; cpsf1, encoding a component of the cleavage and polyadenylation specificity factor complex; parp10, encoding a ADP-ribosyltransferase involved in apoptosis, NF-kB signaling, and DNA damage repair [38]. The sweep on Chr 21 contains one gene, rnaseh2b, which is linked to a chronic inflammatory disorder in humans [39].
The second most significant selective sweep was found on Chr 5 (Fig. 2b) and included the stress and immune response transcription factor genes nkrf and nkap; zbtb33 encoding a transcriptional regulator binding to methylated CpG dinucleotides, and a gene with unknown function, sowahc. Both Nkrf and Nkap are transcription factors which regulate the NF-kB pathway in which Nkap activates many cell processes including inflammation, immunity, differentiation, cell growth and apoptosis, while Nkrf mediates transcriptional repression of certain Nkap responsive genes. Since NF-kB signaling pathways activate the immune system in the host, these proteins are key targets for proteases expressed by invading pathogens [40]. Functional studies of the Nkap protein have revealed roles for this protein in T-cell maturation [41] and mRNA splicing [42]. To our knowledge, no previous studies have identified functionally significant SNPs associated with any of the four genes located within this sweep, however one of the SNPs found in nkap is located in a highly conserved region necessary for transcriptional repression. Here the valine is conserved in other species representing the ancestral variant while in Northern Norway methionine is most common (Additional file 1: Figure S2). This finding may be related to differences in immune defense between salmon from these two regions, a suggestion supported by the fact that the NF-kB pathway is differently regulated in IPN resistant salmon [43]. Further studies will reveal how these SNPs modulate the function of NF-kB and virus response or if other functional properties are associated with the selective sweep on Chr 5.
The most significant sweep was found on Chr 25 and contained a cluster of five mx (myxovirus resistance) genes known to be involved in defense against viruses. Three of these mx genes contained missense mutations; mx1-1, mx1-2 and mx2-1 (Fig. 2c). These proteins are dynamin-like GTPases induced upon virus infection through the innate interferon system. It has been shown that they can act broadly against both DNA and RNA viruses and specifically against certain viruses [44] and studies in mouse, human and chicken have shown that single missense mutations in Mx1 and Mx2 can confer such specific responses [45][46][47][48]. It is possible that the identified missense SNPs in the mx genes reflect specific adjustments to different viral disease pressures between SNPs are indicated as black dots and missense mutations are marked with red squares. The track labeled "HET" shows the heterozygosity of salmon from Northern (blue) and Southern Norway (green) in 3 kb windows. The track labeled "FST" shows the F ST between populations in Northern and Southern Norway in 3 kb windows. In the bottom, identified genes are shown, with genes containing differentiated missense mutations colored black. The x-axis shows the chromosomal positions given in kb northern and southern populations of salmon. We identified missense SNPs in all regions of the protein including a SNP in the antiviral specificity domain in exon 13 (Additional file 1: Figure S3). This SNP represents a structurally relevant amino acid substitution, where arginine seems to be the ancestral variant and cysteine the derived variant dominating in the northern population (Chr 25 position: 47,120,121). Likewise, SNPs in this domain have been associated with specific virus resistance in chicken [49,50] and pig [51]. SNPs in mx genes  Allele frequencies from genotyping are illustrated in Fig. 4 b Allele frequencies from genotyping are illustrated in Additional file 1: Figure S5 have also been investigated in another fish species, the turbot [52], however, properties related to protection against viruses were not investigated in this study. In rainbow trout (Oncorhynchus mykiss) genetic variation in mx between strains in exon 3-6, was correlated with susceptibility to infectious hematopoietic necrosis virus (IHNV) [53]. This virus also infects Atlantic salmon and our discovery of a missense mutation in exon 6 suggests that salmon could have adapted to the IHNV (Additional file 1: Figure S3). In addition, different strains of rainbow trout display variable susceptibility to this virus [54]. In this study we cannot elucidate the functional significance of the acquired SNPs in mx in Northern Norway, however, further studies will reveal whether any of these changes have been involved in host-virus adaptation [55]. We also investigated whether the selective sweeps on Chr 5 and Chr 25 had paralogous regions in the partially tetraploid salmon genome [56]. In silico analysis showed that both sweeps have paralogous regions located on other chromosomes. The Chr 5 sweep has a paralogous region on Chr 9 (Additional file 1: Figure S4, position 51,349,279 to 51,849,279), which did not contain any differentiated SNPs. The synteny is conserved in other species, and the existence of only one copy in zebrafish (Danio rerio), combined with the observation that missense mutations on Chr 5 are not present in the paralogous genes on Chr 9, indicate that the mutations arose after the salmonid specific whole genome duplication (WGD). Based upon this observation, it is possible to speculate that the WGD provided paralogous regions where one copy was free to sub-or neo-functionalize, much like the theory for duplicated genes [57] which has been suggested to be important for evolutionary adaptation and innovation in salmon [58], in teleosts [59] and in general [30]. A similar picture is seen for the sweep on Chr 25 where the paralogous region harbors a cluster of three mx genes on Chr 12 (position 66,552,602 to 67,052,602), but carries no differentiated SNPs or missense mutations. While the sweeps on Chrs 11, 15, 21 and 24 have no clear paralogous regions, the sweeps on Chr 10, Chr 14 and the two sweeps on Chr 13 also have paralogous regions with very few significantly differentiated SNPs, on Chr 16, 27 and 4, respectively (Fig. 3). Similarly, in our recent discovery of the loci in Chr 25 controlling age at maturity [27] we investigated the two paralogous regions in Chr 21, both of which were without SNPs associated with the trait. Together, these findings indicate that the partially tetraploid stage may be beneficial for adaptation, since one gene copy or gene cluster can keep the original function while the other can adapt to a new situation such as novel disease pressures.
In this study, the initial resequencing was based only upon males. This is because it allowed reusing sequence data from our previous work [27]. The targeted SNP analysis, used to validate the results from resequencing in a larger independent set of rivers, was conducted using both males and females (Figs. 1 and 4). Genotyping of mixed sex salmon from 19 rivers (n = 20 salmon/river) along the Norwegian coast (Fig. 1) for five missense SNPs on Chr 5 and 25 confirmed strong genetic differentiation between salmon populations from the north and south of Norway (Fig. 4). Populations from northern rivers (1-9) displayed allele frequencies in the range 0-0.7, while those from southern rivers (11)(12)(13)(14)(15)(16)(17)(18)(19) were close to fixation for one allele at these two loci. Salmon from river 10, Målselv, shows intermediate frequencies, which corresponds well with what has been reported in the literature [31,32]. These results also confirm allele frequency estimations from the pooled resequencing (Table 2). In addition, we designed Sequenom assays for five other missense SNPs in other regions; one SNP each for sweeps on Chrs 10, 13 and 21, and two SNPs in Chr 14. Genotyping was performed for all 19 rivers (Additional file 1: Figure S5). The allele frequencies showed the same clear difference between the northern and southern populations. For the SNPs on Chr 14 there appears to be an additional genetic shift between the rivers 14, Stjørdalselva and rivers south of this. In addition to the data produced within the present study, resequencing data from a recent publication was downloaded and compared to our results [28]. The downloaded data include three individually sequenced salmon from 4 southern and 3 northern salmon rivers in Norway. These data corroborate our resequencing and genotyping results (Additional file 1: Table S3). Our surveyed SNPs therefore also represent robust and good genetic markers for distinguishing northern and southern populations of Atlantic salmon in Norway. Future studies on an extended set of populations may reveal if these are also robust markers for detecting genetic structuring in other parts of the distribution range of the species.
Atlantic salmon aquaculture involves rearing domesticated fish that originate from commercial breeding programs. Forty wild populations from both the north and south of Norway were sampled when establishing the national breeding programs for salmon [60]. However, analyses of genetic markers demonstrate that there is a dominance of salmon from Southern Norway in the domesticated lines currently in production [61]. Genetic analyses of farmed salmon escapees in Norway have uncovered genetic introgression into native salmon populations in both Northern and Southern Norway, but the biological consequence remains unknown [32,61,62]. Consequently the results from the present study, where adaptive genetic divergence between wild salmon from populations located in the north and south of Norway was revealed, it is likely that the potential negative genetic impact of domesticated salmon introgression is greater in populations located in northern regions, since the farmed fish originate mostly from wild Southern Norway populations.

Conclusion
In this study we performed a GWAS by genome resequencing with the aim to screen the Atlantic salmon genome for genetic differentiation between the northern and southern populations in Norway. By investigating eight rivers we uncovered ten particularly striking sweeps including two clusters of immune related genes harboring missense mutations. A feasible interpretation is that different populations of Atlantic salmon have historically been exposed to different selection pressures in the form of pathogens. Some of these adapted alleles could be advantageous for aquaculture production which is currently hampered by a number of diseases, including virus infections [63]. Future studies should include gene editing of immune genes found in these selective sweeps [64,65] in combination with viral exposure experiments. Within these experiments, viruses relevant to salmon aquaculture should be the primary focus since finding specific resistance alleles can be of significant value to the industry and could also be used for protecting wild fish against high disease pressures posed by open cage aquaculture [66]. Upon finding the protective alleles, selective breeding on individuals with beneficial haplotypes could lead to increased welfare for aquaculture salmon, decreased disease pressure on wild populations and could also be economically favorable for the industry. On the other hand, further studies should investigate the impact of genetic introgression from fertile aquaculture escapees on the adaptive genetic properties in wild populations. To reduce the risk of this unwanted loss of local adaptation and alteration of fitness-related traits, a sustainable solution would be the use of sterile fish in aquaculture, especially in Northern Norway. Future studies should also investigate whether paralogous regions of selective sweeps have undergone positive selection or not, as the latter scenario would suggest an evolutionary mechanism which provides higher adaptive possibilities when a genome is partially tetraploid.

Samples and sampling
Scales from 30 Atlantic salmon males per river were selected from a sample set of 26,000 samples collected in coastal fisheries in Northern Norway. In the Kolarctic Salmon project (http://prosjekt.fylkesmannen.no/Kolarcticsalmon), the multilocus genotypes of all individuals were compared to a genetic baseline consisting of over 180 rivers from Northern Russia and Norway and were assigned to river of origin. Samples that were assigned with high probability to four rivers in Northern Norway; Altaelva, Reisaelva, Lakselv and Tanaelva were generously made available to this study. 30 salmon males from each of four different rivers in Southern Norway, including Årdalselva, Eidselva, Gloppenelva and Suldalslågen were sampled and resequenced in a recent study [27]. In addition to these, we also used male and female salmon DNA from 19 rivers along the Norwegian coast. These included 20 parr individuals from each of the rivers Grense Jakobselv, Neiden, Bergebyelva, Komagelva, Kongsfjordelva, Langfjordelva, Børselva, Stabburselva, Repparfjordselva, Målselv, Laukhelle, Alvsvågvassdraget, Årgårdsvassdraget, Stjørdalselva, Jølstra, Lyseelva, Bjerkreimselva, Storelva and Enningdalselva (represented by numbers in Fig. 1). With the exception of Enningdalselva where the sample was obtained from scales collected by recreational fisheries, these samples were obtained from fins collected by electrofishing of juvenile salmon from mulitple locations in the rivers.

DNA extraction and sequencing
DNA from the 19 rivers for genotyping was extracted from scales or fin samples using Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) according to manufacturer's recommendations. From salmon belonging to the four populations in Northern Norway total DNA was extracted from scales using Qiagen DNeasy Blood and Tissue Kit. Equal amounts of DNA from ten individuals were pooled to make three pools per river, totaling 30 individuals from each river. Paired-end libraries were constructed using the Genomic DNA Sample Preparation Kit (Illumina, CA, USA) according to manufacturer's instructions and sequenced on the Illumina HiSeq2000 platform (Illumina, CA, USA) at the Norwegian Sequencing center (https://www.sequencing.uio.no, Oslo, Norway) with each pool sequenced in separate lanes.

Sequence mapping and SNP calling
To ensure high quality sequences, sequenced reads were inspected with FastQC (http://www.bioinformatics. babraham.ac.uk/projects/fastqc/). Adapter sequence removal and quality trimming was done with Cutadapt [67], resulting in 1,077,839,448 (SD 40,407,492) paired reads on average per river. Sequenced reads were mapped to the most recent release of the salmon genome  [68] without soft clipping (end-to-end mode). To increase the sensitivity of the mapping, seed length (−L parameter) was set to 18 and the interval between extracted seeds (−i parameter) was set to S,1,1.5 corresponding to the function f (L) = 1 + 1.5*sqrt (L), where L is the read length. Additionally, the maximum number of mismatches per seed (−N parameter) was set to L,0,0.1, corresponding to the function f (L) = 0 + 0.1*L, where L is the read length, and minimum alignment score (−−score-min parameter) was set to L,-0.6,-0.4, corresponding to the function f (L) = −0.6 + −0.4*L, where L is the read length. To remove ambiguously mapped reads the mapping quality threshold was set to 20. To obtain higher sequence coverage, the three sequenced pools per river were merged to a single BAM file using SAMtools merge. SNPs were called using SAMtools mpileup [69] and the output was parsed using the PoPoolation2 package (mpileup2sync.jar) [70] with a minimum base quality threshold of 20. For a SNP to be included in the final set of high quality SNPs, minimum coverage of 10 and maximum coverage of 50 (99 % percentile) was required for each river. In addition, the total number of observed minor alleles was required to be at least 8. Recently published whole genome resequencing data from individuals [28] was downloaded and mapped to the reference genome. The data included three salmon from each of the rivers Tanaelva, Repparfjordelva, Altaelva, Namsenelva, Årgårdsvassdraget, Nausta and Jølstra, where the first three represent populations in Norhtern Norway and the last four represent Southern Norway. Accession numbers for the samples are shown in the caption of Additional file 1: Table S3.

Statistical analysis
Pairwise fixation index (F ST ) between all eight sequenced populations was calculated for all high quality SNPs using Hudson's estimator for F ST [34]. F ST values were averaged over all SNPs in each population to generate a distance matrix using F ST as genetic distance. This matrix was converted to a newick tree using NEIGHBOR from the Phylip package [71] and a phylogenetic tree was created with NJplot [72]. To find SNPs with significantly different allele frequencies (0.1 % FDR) between populations from Northern and Southern Norway the Cochran-Mantel-Haenszel test for repeated tests of independence from the PoPoolation2 package (cmh-test.pl) [70] was used. The FDR threshold was determined using the method described in [73]. Allele counts for each river were merged to get the total allele count per SNP in Northern and Southern Norway, corresponding to 120 individuals per geographical region. From this, F ST values between the northern and southern populations were estimated using the F ST calculation from the PoPoolation2 package (fst-sliding.pl) for each SNP, with -pool-size parameter set to 120. Genomic regions with low values of heterozygosity may indicate SNPs under selection. Therefore heterozygosity values were estimated for north and south of Norway, separately, for each SNPs as 2 * (major allele frequency * minor allele frequency). Sliding windows of 50 kb with steps of 25 kb was used to find genomic regions with high F ST values and with low heterozygosity values in either Northern or Southern Norway. This approach is similar to one used to discover genomic regions under selection in other animals [36]. To identify putative selective sweeps it was required that the average F ST value of the window was at least 0.17 (above 99.9 % percentile) and that average heterozygosity of the window in either Northern or Southern Norway was at most 0.15 (below 5 % percentile) (Additional file 1: Figure S1). The thresholds were chosen with focus on capturing the outliers in the FST and heterozygosity distributions. Putative sweeps were extended to the sides for as long as the neighboring windows had either average F ST of at least 0.17 or heterozygosity of at most 0.15 in either Northern or Southern Norway. If identified sweeps were less than 50 kb apart these were joined to avoid fragmentation of the putative selective sweeps. Genomic windows containing more than 10 % ambiguous bases (Ns) in the reference assembly were discarded to exclude regions with high levels of uncertainty.

SNP annotation
Genes in the sweep regions were obtained from the official genome annotation (NCBI Salmo salar Annotation Release 100). Missense mutations in selective sweep regions were identified by manual inspection of the coding sequences. Amino acid sequences of five mx genes found in a selective sweep on Chr 25 were aligned to the homologs Mx1 and Mx2 from human and MxD and MxG from Zebrafish using BLASTP (default parameters). Functional domains in the Mx proteins were assigned using domain information for human Mx1 from Uni-Prot. Amino acid sequences from four genes containing missense mutations in a selective sweep on Chr 5 (nkrf, sowahc, nkap and zbtb33) were aligned to homologous zebrafish and Northern Pike genes using BLASTP with default parameters. Synteny between genes in the sweep on Chr 5 and other animals was found using the UCSC genome browser (https://genome.ucsc.edu) to inspect the syntenic regions of zebrafish, human and mouse. Paralogous regions of the sweeps were identified using TBLASTN (default parameters) with the genes in the sweeps against the salmon genome.

Genotyping
Twenty salmon from 19 rivers along the Norwegian coastline (n = 380) were genotyped using ten of the most significant missense mutations on a Sequenom MassARRAY iPLEX platform (San Diego, CA, USA). Primers and extension primers are listed in Additional file 1: Table S4. The genotyping primers were designed to not target any paralogous genes in the genome.

Funding
This project was financed by the Norwegian research council (NFR) and their HAVBRUK-BIOTEK 2021 program (project number 226221-SALMAT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials
Genomic sequences from all sequenced pools used in this study have been made available on SRA with Bioproject number PRJNA305872. A list of high quality SNPs has been deposited at http://marineseq.imr.no/northsouth2016/.
Authors' contributions AW, KAG, VW, CJR and RBE conceived and designed the experiments. FA, EKS and GD conducted laboratory experiments. FA, EKS, TF, CJR, AW and RBE analyzed the data. VW, EN, MO and JPV provided samples for analysis. EKS, FA, KAG, AW and RBE wrote the first draft of the paper. All authors read and approved to the final manuscript.

Competing interests
The authors declare that they have no competing interests.

Consent for publication Not applicable.
Ethics approval and consent to participate Scale samples from adult wild salmon in the rivers were collected by recreational anglers. Scale samples of salmon from four of the northern rivers were collected in commercial coastal fisheries for salmon. Thus no permits/ licenses regarding the collection of these samples were required. Juvenile samples from other rivers were collected by own efforts or by several cooperating agencies with permits from the County Governor in the respective counties.
Author details