A genome scan for selection signatures comparing farmed Atlantic salmon with two wild populations: Testing colocalization among outlier markers, candidate genes, and quantitative trait loci for production traits

Abstract Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.

Domestication can leave detectable signatures of selection within the genomes of agricultural species because of strong artificial selection for specific phenotypes that increase yield in a controlled farm environment. Comparison of the genomes of domesticated species to their wild founder populations can help identify the genes underlying differentially selected traits, thereby advancing a fundamental goal of evolutionary biology (Stinchcombe & Hoekstra, 2008).
Population genomics approaches have been used to support the hypothesis of artificial selection by humans over the past millennium causing allele frequency changes at major loci that determine cob size in maize (Vigouroux et al., 2002), muscle growth in pigs (Van Laere et al., 2003), and coat color in domestic mammals (Cieslak, Reissmann, Hofreiter, & Ludwig, 2011). Motivation behind genome scans for selection signatures lies in the possibility of finding DNA markers associated with traits of economic interest that can be used for marker-assisted selection (López, Neira, & Yáñez, 2015;Yáñez, Houston, & Newman, 2014).
Two scenarios can theoretically arise when a domestic population is founded from a wild population and then subjected to strong artificial selection for a value of a trait that is largely determined by a major locus: (i) a hard sweep where a new favorable mutation at the major locus rapidly becomes fixed, initially resulting in a single long haplotype block surrounding the mutation that is well differentiated from the wild population along its entire length, (ii) a soft sweep where the frequency of a pre-existing favorable allele at the major locus increases along with all the multiple original haplotypes containing it; the latter results in short haplotype blocks immediately surrounding the locus that are differentiated from the wild population but preserves pre-existing variation in more distant regions (López et al., 2015). Genome scans support the existence of long haplotype blocks in domestic chickens where regions of low heterozygosity containing previously described candidate genes for laying traits that can span several Megabases (Mb) suggesting recent hard sweeps (Qanbari et al., 2012). Soft sweeps characterized by narrow divergent haplotype blocks were more common than hard sweeps characterized by wide divergent haplotype blocks in a population of maize that had undergone strong positive directional selection for ear number for 30 generations (Beissinger et al., 2014). In practice, these two scenarios represent a continuum with the long haplotype blocks of the first becoming shorter and harder to detect over millennia unless they are in a nonrecombining region of the genome (Ai et al., 2015).
The increased availability of high-resolution SNP datasets and genome resequencing datasets for livestock species has permitted genome scans of livestock populations to be compared with those of their wild ancestral population(s). Changes in the genomes of different breeds of domesticated pigs detected by comparing SNP allele frequencies with those of ancestral wild boars are attributed to nearby genes under selection (Ai et al., 2015;Li et al., 2013). Similar studies have been published in cattle The Bovine HapMap Consortium 2009) and sheep (Chessa et al., 2009).
Statistical methods are being developed to identify "outlier" loci that have either higher or lower differentiation in allele frequencies among populations than expected under models without selection (Beaumont & Balding, 2004;Beaumont & Nichols, 1996;Excoffier, Hofer, & Foll, 2009;Foll & Gaggiotti, 2008). Most methods measure genetic differentiation among populations using F ST, a standardized measure of divergence among populations that can be calculated separately for each genetic locus (Weir & Cockerham, 1984;Wright, 1965). Markers showing higher than expected F ST values are identified as outliers putatively under population-specific directional selection, whereas those with lower than expected F ST values are identified as outliers putatively under balancing selection (Cavalli-Sforza, 1966;Lewontin & Krakauer, 1973;Strasburg et al., 2012). Outlier analyses (Beaumont & Nichols, 1996) assume that the two or more populations being compared are each monophyletic in origin. To satisfy this assumption, any sampled individuals that are hybrids between highly divergent strains or subspecies should be removed from the dataset before doing the analysis. Hybrid or admixed individuals can often be detected using software such as STRUCTURE first to estimate the number of founder population clusters present in a dataset; the average multilocus allele frequencies characteristic of each cluster can then be used to identify admixed individuals (Falush, Stephens, & Pritchard, 2003;Hubisz, Falush, Stephens, & Pritchard, 2009;Pritchard, Stephens, & Donnelly, 2000).
Signatures of selection within the genomes of Atlantic salmon may sometimes be easier to detect than those in livestock because artificial selection for economically important traits has only been implemented for 5 to 15 generations (Gjedrem, Gjoen, & Gjerde, 1991;Quinton, McMillan, & Glebe, 2005;Wood, Anutha, & Peschken, 1990) rather than 5,000 generations (Craig, 1981). Firstly, because of the low number of generations, samples from putative wild founder populations that that have not experienced large changes in allele frequencies from genetic bottlenecks are more likely to be available. Secondly, 5-15 generations is insufficient time for recombination to have reduced the size of long haplotype blocks resulting from hard sweeps (Sabeti et al., 2002) making outlier loci easier to detect with a low density of markers. Finally, realistic simulations of a single locus trait that is under moderately strong selection (s > 0.25) in the aquaculture population but not in the wild founder population for fewer than 10 generations show that the statistical power to detect the divergence in allele frequencies is high (ß > 0.8) (Karlsson & Moen, 2010).
In contrast, the similar selection differentials applied to classic quantitative traits like growth are assumed to cause only minor changes in allele frequencies (Falconer & Mackay, 1996) that would be undetectable using outlier locus methodology (Wellenreuther & Hansson, 2016), but see Fontanesi et al. (2015). Both single locus and quantitative traits might show rapid genetic divergence between aquaculture strains and wild populations of Atlantic salmon. Traits that may change in aquaculture strains as a direct response to artificial selection include increased growth rate, later adult maturity, redder flesh color, and better taste (Rye, Gjerde, & Gjedrem, 2010). Traits that may change as an indirect response to selection for increased growth rate under hatchery conditions include increased levels of aggression, increased boldness, altered parasite or disease resistance, increased appetite, and higher food conversion efficiency (Araki & Schmid, 2010;Bekkevold, Hansen, & Nielsen, 2006).
We hypothesized that outlier analysis that compared the genomes of recently created North American aquaculture populations of Atlantic salmon from the Saint John River (SJR), with that of their putative wild founder population from the Tobique River, would be likely to identify genome regions underlying traits known to be under strong artificial selection in the hatchery environment. First, STRUCTURE was used to identify admixed individuals (Bradbury et al., 2015;Pritchard et al., 2016) with partial European ancestry so that they could be removed from the dataset. Second, two outlier locus detection programs (Arlequin 3.5 and BayeScan 2.1) were used to compare the putative founder population to four nonoverlapping year classes of the aquaculture population. Parallel analyses comparing the aquaculture population with a nearby nonfounder wild population from the Stewiacke River were conducted to determine whether an overlapping set of outlier loci would be discovered. Thirdly, we located the chromosome position of each outlier locus on linkage maps and compared its position with that of previously published: a) outlier loci, b) candidate genes, and c) QTLs for growth, life history, and immune traits.
This allowed us to identify the genome regions, as well as some putative candidate genes, that have responded to either deliberate artificial selection for growth and late maturity or to accidental selection exerted by the hatchery environment.

| Sampling strategy and SNP genotyping
Atlantic salmon belonging to eight putative populations and one hybrid population were sampled between 2006 and 2012 (Table 1).
The Canadian populations included two generations from four independent year classes (hereafter "populations") of the SJR aquaculture strain (AQUA). It also included 98 hatchery-spawned adults captured as wild smolts from the Tobique River (TOB_WILD), which is an upper tributary of the SJR, and 100 hatchery-spawned "wild" adults from the Stewiacke River (STW_WILD) as part of the live gene-banking program (O'Reilly & Doyle, 2007).
The MBF was built in 1968 to mitigate the effects of the Mactaquac dam on the SJR. The completion of the dam in 1968 prevented salmon from swimming upriver to their historical spawning areas. Thereafter, Atlantic salmon that returned to spawn just below the Mactaquac dam were collected in a "fish lift". The "wild" fish were transferred from the lift to a truck and released upriver while the "ranched" fish were transferred to holding tanks and spawned in the Mactaquac Biodiversity Facility. The Tobique River comprised about 60% of the original spawning habitat of the SJR tributaries above the Mactaquac dam (http://www.inter.dfo-mpo.gc.ca/Maritimes/Biodiversity-Facilities).
The subsequent 4-5 generations of each of the four separate AQUA populations were created by crossing the 50 male and the 100 female candidate broodstock with the highest estimated breeding values for saltwater growth in a paternal half-sibling design (J. A. K Elliott, pers. obs.).

Atlantic salmon populations in the Bay of Fundy in eastern Canada
can be divided into two regional groups: those in the outer Bay of Fundy (oBoF) [e.g., the Saint John River (SJR) and the Tobique River] and those in the inner Bay of Fundy (iBoF) (e.g., Stewiacke River), using evidence from microsatellite analysis (King, Kalinowski, Schill, Spidle, & Lubinski, 2001), allozyme studies (Verspoor, O'Sullivan, Arnold, Knox, & Amiro, 2002), gene expression patterns (Tymchuk, O'Reilly, Bittman, Macdonald, & Schulte, 2010), and SNPs (Freamo et al., 2011). We chose TOB_WILD as a putative founder population of AQUA; however, we acknowledge that salmon from any tributary of the SJR above the Mactaquac dam (http://atlanticsalmonfederation.org/rivers/newbrunswick.html) could potentially have contributed to AQUA (Farmer, 1991). The iBoF-STW_WILD population was chosen as a putative outgroup because it is not thought to have contributed to the SJR aquaculture strain (O'Reilly & Doyle, 2007) and because it is known to grow more slowly than the oBoF-TOB_WILD population . Finally, to identify individuals from AQUA populations with partial European ancestry, we genotyped a purebred, full-sibling family of fish from the Mowi aquaculture strain (MOWI_EU) founded from the Voss River region in Norway (Ferguson et al., 2007) as well as some hybrids from documented F1 and backcrosses between MOWI_ EU and AQUA (Boulding et al., 2008). This was necessary because AQUA had been reported to have some residual European ancestry due to the importation of the European subspecies into Maine (Glebe, 1998;O'Reilly, Carr, Whoriskey, & Verspoor, 2006). Atlantic salmon (Salmo salar L.) can be characterized as two subspecies, the "North American" and the "European" (King et al., 2007), which differ in the number of chromosomes (Brenna-Hansen et al., 2012).

| Population structure analysis
All individuals from the nine sampled putative populations were included in the Bayesian cluster population structure analysis with STRUCTURE 2.3.4 (Hubisz et al., 2009;Pritchard et al., 2000). To infer the optimal number of clusters, K, the STRUCTURE simulation results were analyzed according to the delta K method (Evanno, Regnaut, & Goudet, 2005). We set the K from 1 to 10, so that the maximum number was larger than the number of putative populations (n = 9), thereby avoiding inappropriate clustering due to K being set too small (Kalinowski, 2011). The simulations were repeated for values of K ranging from 1 to 10, with five repeat runs for each K value using the admixture model and no prior probabilities for cluster membership. For each simulation at a given K value, a burn-in period of 50,000 iterations was followed by 500,000 final iterations. Finally, the graphical bar plot of membership coefficients was generated using the DISTRUCT software with the fill color of the bars indicating cluster membership (Rosenberg, 2004). Each individual fish was also given an estimated membership coefficient for each of the K clusters, corresponding to the fraction of its genome inferred to have ancestry in the cluster. Some individuals in our dataset were assigned to two or more clusters (or putative, genetically distinct populations) suggesting that their genotypes were admixed because of past hybridization.

| Pairwise genetic distances among putative populations
We were primarily interested in comparing the genetic distance between TOB_WILD and the four populations of AQUA as they diverged from a common ancestral population five to six generations ago (Table 1). All genetic distance calculations were performed on a reduced dataset after individual fish that had been estimated by STRUCTURE to have more than 5% European ancestry had been removed. Pairwise comparisons of several measures of genetic distance were then estimated using Arlequin 3.5 (Excoffier & Lischer, 2010) with 10,000 permutations to determine statistical significance.
We present both the pairwise F ST values (Weir & Cockerham, 1984) calculated using Nei's mean number of pairwise differences between T A B L E 1 Atlantic salmon (Salmo salar L.) samples analyzed in the present study  (Boulding et al., 2008). f The sampled "wild-exposed two-year salmon" from single-pair crosses were captured as smolts or presmolts from the wild by DFO technicians (i.e., caught as smolts in the rotary screw traps set lower in the Tobique River system) then reared in the Mactaquac hatchery for 2 years to maturity. populations (Nei & Li, 1979) as well as Slatkin's linearized F ST (Slatkin, 1995) as implemented in Arlequin 3.5 (Excoffier & Lischer, 2010). The latter is recognized as a good method of obtaining an approximately linear genetic distance for binary characters such as SNPs (Excoffier & Lischer, 2010).

| Detecting loci under divergent selection in AQUA versus WILD
Methods of detecting F ST outliers resulting from "soft sweeps" suitable for low-density SNP chip datasets include (i) FDIST2, which compares the F ST observed for a locus to the F ST expected under neutrality relative to its observed heterozygosity assuming an "n-island" model where the effective population size of all population is constant as is the pairwise population migration rate (Antao, Lopes, Lopes, Beja-Pereira, & Luikart, 2008;Beaumont & Nichols, 1996), (ii) Arlequin v.3.5, which implements FDIST2 methodology with the addition of a hierarchical island option that assumes two different constant migration rates with the lower rate among regional groups of populations and the higher rate among populations within the same region (Excoffier et al., 2009), and (iii) BayeScan, which separately estimates the posterior probability that each locus is under selection without assuming that all effective population sizes and migration rates are equal (Beaumont & Balding, 2004;Foll & Gaggiotti, 2008). In this study, Arlequin 3.5 and BayeScan 2.1 were chosen for the outlier analysis to be a good combination to reduce type I (false positive) and type II test (Sabeti et al., 2002) require higher resolution coverage of the genome for traits of unknown location than was possible with our 3980 SNP dataset.
All outlier locus detection analyses used the reduced dataset that omitted any individual identified by STRUCTURE as having European ancestry as this might have affected which outlier loci were detected (Gosset & Bierne, 2013). We used four different methods of comparing the TOB_WILD population with the AQUA populations to find regions of the genome with differences in SNP allele frequencies that were greater than would be expected due to genetic drift alone. Extreme differences in allele frequencies at a particular SNP locus are most parsimoniously explained from recent positive artificial selection on the AQUA population that has not experienced by TOB_WILD population as they shared a common ancestral population 5 to 6 generations earlier. We were particularly interested in finding "consistent" loci that were identified as being an outlier locus putatively under diversifying selection by more than one of the four methods.
Three of the outlier detection methods, Method 1 "nonhierarchical" using FDIST2, Method 2 "pairwise" using FDIST2, and Method 3 "hierarchical" island model, all used Arlequin 3.5's module "Detect loci under selection" (Excoffier & Lischer, 2010). We identified all SNP loci with p values <.01 that to be putative outlier loci under diversifying selection but took special note of outlier loci with p values <.001 (e.g., Johnston et al., 2014).  (Table 1). Method 2 was composed of six separate pairwise analyses using the nonhierarchical (FDIST2) option of Arlequin 3.5, each comparing one of the AQUA populations with the TOB_WILD population.
Method 3 used the hierarchical island version of the outlier module in Arlequin 3.5 that implements a hierarchical FDIST2 (Excoffier et al., 2009). This module has a hierarchical island model that allows for lower migration rates among different "island" groups than among populations within groups. To allow estimation of its within-group variance, the TOB_WILD population was randomly split into two random populations and placed in one group and the three largest In the pairwise Arlequin analyses that we conducted for Method 2, we noticed that the outliers were very similar between the parent and grandparent generations of an AQUA population (Appendix S3).
Therefore, for Methods 1 and 3, we merged the data for the parents (P) and the grandparents (G) into a single "combined" population.
Method 4 used BayeScan, which employs a nonhierarchical Bayesian approach to detecting outlier loci (Foll & Gaggiotti, 2008). Method 1 and Method 3 outlier loci analyses that compared the AQUA populations to the STW_WILD population are only briefly presented in the main part of the manuscript because our primary focus was on divergence between the TOB_WILD and AQUA populations.

| BLASTX analysis and chromosomal location of the outlier loci
Possible mechanisms of selection acting on outlier loci were investigated based on their similarity to known proteins. The translated DNA sequence that contained the outlier SNP was queried against the NCBI protein database (BLASTX). Simultaneously, we also referred to the "best hits" blast results for this SNP chip from Bourret, Dionne, et al. (2013), who set a minimal E-value of 1 × 10 −3 for BLASTX analysis with BLAST2GO and a minimal E-value of 1 × 10 −10 for gene ontology (GO) terms.
We also used BLASTN against the Salmo salar genome to identify the location of 17 consistent outlier SNPs on the physical map of the Atlantic salmon genome . To do this, we first

| Colocalization of outlier loci and published QTLs for growth, maturity, and domestication traits
We tested for colocalization of outlier loci and weight and maturity QTLs previously published for Atlantic salmon as has previously been done for whitefish (Rogers & Bernatchez, 2005), pea aphid (Via & West, 2008), apples (Leforestier et al., 2015). We attempted to iden-

| Genotyping
We genotyped 1155 salmon from nine salmon populations on the 6K SNP chip. We first removed loci that had a minor allele frequency (MAF) that did not exceed 5% in any population and then removed loci that clustered poorly because they were products of simultaneously genotyping duplicated regions (MSVs, PSVs) (Appendices S1-S6). A total of 3980 SNPs remained in our dataset of which 3901 could be located on the North American Atlantic salmon linkage map.

| Population structure
STRUCTURE analysis clearly showed that K = 2 groups gave the best fit to our dataset of nine putative populations (Fig. S1). All samples known to have European ancestry were assigned to cluster 2 and were omitted from all subsequent analysis. STRUCTURE also identified 252 AQUA individuals that had more than 5% admixture from cluster 2 (Fig. S2) and these individuals were also deleted from our dataset. This left 687 individuals from five AQUA populations, 98 individuals from the TOB_WILD population, and 100 individuals from the STW_WILD population (Table 1) to be used in genetic distance calculations and outlier loci analyses.

| Pairwise genetic distances among sampled populations
As expected, STW_WILD from the iBoF was more distantly related to all AQUA populations (F ST = 0.101-0.132; Table 2  All pairwise population comparisons were significant at p < .001 (Footnote 1 of Table 2). Method 3, which used the hierarchical Arlequin analysis (Table   S1), found a total of 54 "high" F CT outlier loci that were putatively under divergent selection between the AQUA and TOB_WILD populations. SGoF+ verified that all 54 outliers had q-values <0.05.

| Outlier detection-TOB_WILD versus AQUA
Of these 54 divergent F CT outlier loci, 31 were also significantly divergent F ST outlier loci. Both types of outlier loci were well dispersed throughout the genome (Figure 3a,b). Most strikingly, 23 of the F CT outlier loci had not had highly significant p-values (α = 0.01) for nonhierarchical F ST analysis and thus were completely new outliers ( Figure 3a; Appendix S5).
The set of F ST outliers found by Method 1 (nonhierarchical FDIST2) and the set found by Method 3 (hierarchical) strongly overlapped ( Figure 4; Appendices S1 and S5). Only six outlier loci were found solely in the analysis using Method 1 (Figure 4; Appendices S1 and S5). Both of the two parallel loci GCR_cBin15233_Ctg1_136_V2 and GCR_cBin27732_Ctg1_177 discovered by Method 2 pairwise analyses were also found by Method 1 and Method 3 analyses (Figure 4; Appendices S1 and S5).
Methods 1, 2, and 3 found a total of nine unique very highly significant outlier loci with p-values of <10 −4 (Figures 1a, 2 and 3). Three of these highly significant outlier loci were classified as "consistent" (ssa01p/23: rs159404664) was annotated as palmitoyl transferase zdhhc7-like (Table 3), and (iii) GCR_cBin27732_Ctg1_177 was one of only two parallel outlier loci that were significant for all six Method 2 pairwise comparisons in Arlequin using FDIST2 (Figure 2; Appendix S3).
In addition to these three, a total of 17 "consistent" F ST outlier loci were found in common among Method 1, Method 2, and Method 3 that compared the AQUA populations with the TOB_WILD population ( Figure 4, Table 4). All 17 were also significant F CT outlier loci (Appendix S5). Three of these 17 "consistent" outlier loci were also found by BayeScan (GCR_cBin15233_Ctg1_136_V2, ESTNV_24797_128, and GCR_cBin4356_Ctg1_1956; Tables 3 & 4).  (Slatkin, 1995) as implemented in Arlequin 3.5 (Excoffier & Lischer, 2010). All genetic distance measures are highly significant in a permutation test with 10,100 permutations at p = .00000 + −.0000 except for # which had p = .00762 + −.0009. b Nei's mean number of pairwise differences between pairs of populations is a method of measuring genetic distance when the characters are binary (Nei & Li, 1979) as implemented in Arlequin 3.5 (Excoffier & Lischer, 2010). All genetic distance measures are highly significant in a permutation test with 10,100 permutations at p = .00000 + −.0000 except for # which had p = .00762 + −.0009. c Full population names corresponding to these abbreviations are given in Table 1. d All genetic distance calculations were performed on a reduced dataset after individuals shown by STRUCTURE to have European ancestry had been removed.
Many outlier loci found using all four methods were homologous to known proteins and could be located on the linkage maps and on the physical map of the Atlantic salmon genome (Table 3). Protein homologies were found for all 17 "consistent" outlier loci that had "EST" prefixes, indicating these SNPs were discovered by aligning expressed sequence tags (Table 3). Many outlier loci with a prefix of "GCR" (genome complexity reduction sequences) could not be identified by BLASTx, perhaps because the available DNA sequence containing the SNP was too short to conclusively identify a protein or because they were in noncoding regions.

| Outlier detection-STW_WILD versus AQUA
The Method 1 nonhierarchical FDIST2 Arlequin analysis of the STW_WILD population and the six AQUA populations detected 54 outlier loci (Appendix S4) putatively under divergent selection.
Just one of these (GCR_cBin8095_Ctg1_103) was also detected by both hierarchical and nonhierarchical Arlequin analyses of the AQUA populations with the TOB_WILD population (Appendices S1 and S5). This SNP was in the coding region of "protein phosphatase 1 regulatory subunit1B-like" adjacent to the protein "stAR-related lipid transfer protein 3-like isoform X2" (

| Colocalization between outlier loci, QTLs, and candidate genes for production traits
Many of the 17 "consistent" outlier SNP loci from the AQUA versus TOB_WILD outlier analyses were on the same chromosome arm as candidate loci, or published QTLs for growth, maturity, or immune traits (Table 4). For example, the four outlier loci were in or located near candidate genes on the physical map that are part of the immune system. One outlier SNP locus was in a MHC gene (ESTNV_24797_128), ESTNV_16810_167 was near the candidate gene MHC class IIb and colocalized with previously published outliers, and two others (GCR_ cBIN4356_Ctg1_1956 and ESTNV_17881_371) were near candidate loci that are part of the immune system (Table 4). Of even more interest were the three outlier loci located near candidate genes on the physical map associated with appetite and feeding. ESTNV_35352_77 was adjacent to a carbohydrate response element, GCR_cBin27732_ Ctg1_177 was near melanocortin receptor 4, and ESTV_14714_122 was near somatostatin receptor 5 (Table 4).
Manhattan plots showed that some as highly significant F CT outliers (on Ssa13, Ssa19 and Ssa23) did occur in the same regions of the linkage map as published QTL for growth and maturity ( Figure 3C: j13, b14, c19, d19, b23) but that other F CT outliers (on Ssa6) did not ( Figure 3c). There was no overall correlation between the position of the outlier loci and SNP markers for published QTL for weight or maturity. The chi-square test of association that compared the number of outlier loci and the number of QTL in each chromosome bin was not significant (p = 1.0). In addition, the Pearson correlation between the maximum −log 10 p value of a QTL and the maximum −log 10 p value of a F CT outlier loci was not significant (Table S3).
However, the correlation between the maximum −log 10 p value of a QTL and the maximum number of studies that had detected it was significant (Table S3).  Non-hierarchical Arlequin outlier analysis (Appendix S1) was done using the nonhierarchical option and included five AQUA populations:

| Outlier loci can identify major locus traits for production traits
We hypothesized that fundamental questions about the genetic basis of economically important traits could be addressed by comparing allele frequencies at 3980 SNP loci between a farmed strain of Atlantic salmon established five generations ago and its wild founder population. We found a total of 17 "consistent" high-F ST outlier loci (Figure 4), three loci were in or near areas of the genome that contained candidate loci affecting appetite or metabolism and therefore potentially affecting growth, and four loci were in or near candidate genes conferring resistance to particular diseases or parasites ( Table 4) We qualitatively detected some outlier loci that colocalized with candidate loci and with QTLs for growth even though the overall associations between outlier loci and QTL were not significant. Since their independent establishment, the four independent AQUA populations (Table 1) have each experienced artificial selection for fast growth rate, and a low incidence of early sexual maturation beginning in the early 1980s (O'Flynn et al., 1999;Quinton et al., 2005; J. A. K. Elliott, unpublished data) and "accidental" domestication selection for adaptation to hatchery conditions since 1968. Genetic gains in quantitative traits such as weight at age are cumulative and have been shown to be measurable statistically in Atlantic salmon after only one generation of selection (Friars et al., 1995). Bourret, Dionne, and Bernatchez (2014) argued that natural selection on polygenic traits like survival in salmon would be difficult to detect with single outlier loci and proposed a multivariate solution. In turbot, partial colocalization was observed between candidate genes for growth and previously published QTLs, whereas high colocalization was observed for candidate genes associated with immune response and previously published QTLs for resistance to disease or parasites (Figueras et al., 2016). Simulations of traits determined by 1 to 10 QTL suggest that outlier "SNP"-like loci physically linked to QTL are more likely to show colocalization with the nearest QTL than randomly chosen "SNP" loci but that false positives are common (Vilas et al., 2012).
Studies with livestock show that genome scans are more likely to detect signatures of selection for single locus traits, such as coat color, the absence of horns, or immune traits than for polygenic traits, such as milk yield and growth. This is true even in species, like cattle that have enormous pedigreed datasets from high-density SNP chips, wellannotated genomes, and precisely mapped QTLs for production traits (Kemper, Saxton, Bolormaa, Hayes, & Goddard, 2014;Stella, Ajmone-Marsan, Lazzari, & Boettcher, 2010 (Gautier & Naves, 2011;Qanbari et al., 2011), coat color, and horn development (Druet, Pérez-Pardal, Charlier, & Gautier, 2013) but also have detected signatures for immune responses (Gautier & Naves, 2011). In pigs, selection signatures have been identified in genomic regions associated with selected traits such as coat color, ear morphology, reproductive characteristics, and fat deposition (Wilkinson et al., 2013; adaptation to low temperatures (Ai et al., 2015) and also immune traits (Yang, Li, Li, Fan, & Tang, 2014)). In chickens, researchers have identified 82 selection signatures associated with traits under artificial selection such as eggshell hardness but have also detected those associated with immune system characteristics (Qanbari et al., 2012).

| Outliers in similar genome regions among studies?
Our study found a different and nonoverlapping set of SNP outlier loci than that found by two previous studies that compared similar wild and aquaculture strains of North American Atlantic salmon from the SJR system. Vasemägi et al. (2012) found four outlier loci of a 320 SNP dataset consisting of six fish from the Nashwaak tributary of the F I G U R E 4 A Venn diagram with the circles representing the three different methods of outlier analysis done comparing the TOB_WILD population with the AQUA populations using Arlequin 3.5 showing: (i) the one overlapping subset of outlier loci found by all three methods of analysis, (ii) the three overlapping subsets of outlier loci that were found only by two methods of analysis, and (iii) the three subsets found exclusively by one method of analysis. The 17 loci found by all three methods are shown in Table 4 T A B L E 4 Comparison between overlapping sets of 17 outlier loci found by all three different methods a of outlier analysis that compared the TOB_WILD population with the AQUA populations using Arlequin 3.5 (Figure 4) Nonhierarchical AMOVA (Appendix S1), pairwise AMOVA (Appendix S3), and hierarchical AMOVA (Appendix S5).
b "Chr" means the chromosome number on which the outlier locus was located. year class SJR aquaculture strain (Quinton et al., 2005). All four of his outlier loci were also successfully genotyped by Mäkinen et al. (2015) and by our study but were not identified as outliers in either case. On the other hand, some of the outlier loci that differ between studies are in the same region of the genome. For example, the outlier BASS113_ B6A_E01_397 that was found by Mäkinen et al. (2015) is within 3 cM of one of our consistent outlier SNPs (ESTNV_16810_167, Table 4) that is located on chromosome Ssa12.
We identified about 1% of the SNPs we genotyped as outlier loci even after FDR was considered. This is similar to previous studies of wild fish populations that typically identify between 1% and 10% of the loci surveyed as outlier loci (Bradbury et al., 2013;Narum et al., 2010). We recognize that not all our outlier loci are necessarily a result of differences in local extrinsic selection. Recent theoretical work has shown that various factors other than selection can influence heterogeneous divergence of genomic regions among populations including demographic history, geographic structure, and close physical linkage (Klopfstein, Currat, & Excoffier, 2006).
As expected, all AQUA populations were characterized by smaller genetic distances from the TOB_WILD population than from the STW_WILD population, supporting the hypothesis that the TOB_ WILD was the most closely related to the AQUA founder population.

| Correlation between outlier loci and major QTL for sea age
Although comparisons of closely related populations are usually most useful for detecting outlier loci under diversifying selection (Li et al., 2013), the analysis of STW_WILD versus AQUA identified an outlier locus correlated with a major QTL associated with age at sexual maturity. STW_WILD is an iBoF population that has more one-sea winter spawners (grilse), whereas TOB_WILD is an oBoF population that has more multisea winter spawners (P. O'Reilly, pers. comm. to one previously associated with sea age in wild Atlantic salmon populations (Johnston et al., 2014) that was not significant in a second GWAS after accounting for population structure (Barson et al., 2015). It also identified a consistent outlier GCR_cBin15233_ Ctg1_136_V2 on Ssa15 near the TSHR (the thyrotropin receptor) (Table 4). TSHR induces final sexual maturity in salmonids and is considered analogous to LH (luteinizing hormone) in vertebrates (Oba, Hirai, Yoshiura, Kobayashi, & Nagahama, 2001). Genetic changes that resulted in delayed binding of thyrotropin to the receptor (Oba et al., 2001) could have been selected for under strong artificial selection for delayed sexual maturity experienced by the AQUA strain.

| Admixture removal before outlier detection
Search for outlier loci in North American populations without first removing fish with putative European ancestry could have led to the detection of a large number of false-positive outlier loci in the analysis because of large differences in average allele frequencies at the loci on the same 6K SNP chip  that was used here.
We believe that the well-documented use of fish with European ancestry in salmon breeding populations in Maine, just south of the Bay of Fundy region where SGRP was developed (Glebe, 1998) justifies the use of our K = 2 ( Fig. S1) STRUCTURE results (Fig. S2) to remove individuals with more than 5% European ancestry from the subsequent genetic distance and outlier analyses (Fig. S2). We acknowledge that, because of the small sample size of known pure and hybrid European fish included in our original STRUCTURE analysis, we were initially not confident that all of the 252 excluded fish had European ancestry. We also acknowledge that variation in our population sample size, which ranged from 8 to 268 for different populations in this study (Table 1)

| Gene ontology of outlier loci and nearby candidate genes
The historical objective of the selection program for Atlantic salmon in the SJR has been rapid growth rate and late sexual maturity (Quinton et al., 2005;Saunders, 1981). Direct selection for polygenic traits, such as growth rate and late maturation, may result in indirect selection for a wide variety of biological pathways, including increased appetite and immune responses to specific pathogens or metabolism.
Strong selection for high growth rates could select for fish that spend more time eating. This mechanism could be responsible for the allele frequency differences between the AQUA and founder population at the consistent outlier GCR_cBin27732_Ctg1_177 which is located near melanocortin receptor 4 (Table 4). Melanocortin peptides affect motivation to feed in rainbow trout (Schjolden, Schiöth, Larhammar, Winberg, & Larson, 2009) and have previously found to be associated with weight gain and feed intake in pigs (Fontanesi et al., 2015). Two other consistent outlier loci are located near candidate genes associated with growth and metabolism.
High levels of somatostatin are known to reduce growth rates in rainbow trout by lowering plasma levels of growth hormone (GH), insulin-like growth factor-1 (IGF-I), and insulin (Very, Kittilson, Klein, & Sheridan, 2008). Alleles that reduce levels of somatostatin might therefore be indirectly selected for under strong artificial selection for increased growth rates. ESTNV_35352_77 is adjacent to a carbohydrate response element transcription factor that regulates carbohydrate metabolism in the liver and pancreatic β-cells of humans in response to high levels of glucose (Noordeen et al., 2010). Selection for high growth rates on a high carbohydrate feed diet could result in genetic changes to that have affected the expression levels of this transcription factor. Taken together, these changes at multiple outlier loci may result in more efficient feed utilization of the SJR AQUA strain. Similar changes may explain the higher growth rate of a line of Atlantic salmon selected for fast growth relative to wild salmon despite its lower feed consumption (Thodesen, Grisdale-Helland, Helland, & Gjerde, 1999).
Several outlier loci detected here are in (ESTNV_24797_128, to be involved in disease resistance (Gómez, Conejeros, Consuegra, & Marshall, 2011;Harstad, Lukacs, Bakke, & Grimholt, 2008). Allele frequency changes between the founder and AQUA populations may have occurred because of accidental selection that occurred when "ranched" wild fish were spawned at high densities in hatcheries (Fleming, Agustsson, Finstad, Johnsson, & Bjönsson, 2002;Tonteri, Vasemägi, Lumme, & Primmer, 2010) after the building of the Mactaquac hatchery on the SJR. There was never any deliberate artificial selection for disease resistance as part of the SGRP breeding program (Quinton et al., 2005) that could account for these changes but there may have been accidental selection.

| Practical applications of identifying SNPs associated with growth and immune traits
This study detected two parallel and 17 consistently detected high-F ST outlier SNP loci between the AQUA and TOB_WILD populations many of which were associated with candidate genes for production traits. We are planning to incorporate SNPs in candidate genes for growth (somatostatin receptor 5), appetite (melanocortin receptor 4; carbohydrate response elements), sea age (vgll3; TSHR), and immune traits (MHC class2) into low-density SNP assays so that the putative relationships with economically important traits can be validated in larger populations. The density of the 6K chip used here was too low to effectively detect fine-scale patterns of among population divergence and within population homozygosity in response to population-specific directional selection. More precise detection of long haplotype blocks from selective sweeps is now possible using the 220K SNP chip  and the high-density 132K SNP chip  that have been developed for European Atlantic salmon.
To distinguish between linkages with causal genes and showing that a causal gene determines a quantitative trait, future eQTL work investigating the regulation of expression of candidate genes near the outlier loci discovered here might reveal their functional role in biological pathways. In this way, it may be possible to systematically elucidate how divergent selection between wild and farmed populations works on economically important quantitative traits. Relationships between the outlier SNPs and published QTLs will continue to be tested in more precise QTL-mapping and GWAS studies using higher density SNP chips. Such studies would facilitate the application of modern markerassisted selection for the genetic improvement in Atlantic salmon stocks (e.g., Moen & Ødegård, 2014). Ultimately, a deeper understanding of artificial selection in Atlantic salmon and other recently domesticated species may inform how the genetic architecture of quantitative traits affects the mechanisms of evolutionary change in natural populations.