Characterization of natural variation in North American Atlantic Salmon populations (Salmonidae: Salmo salar) at a locus with a major effect on sea age

Abstract Age at maturity is a key life‐history trait of most organisms. In anadromous salmonid fishes such as Atlantic Salmon (Salmo salar), age at sexual maturity is associated with sea age, the number of years spent at sea before the spawning migration. For the first time, we investigated the presence of two nonsynonymous vgll3 polymorphisms in North American Atlantic Salmon populations that relate to sea age in European salmon and quantified the natural variation at these and two additional candidate SNPs from two other genes. A targeted resequencing assay was developed and 1,505 returning adult individuals of size‐inferred sea age and sex from four populations were genotyped. Across three of four populations sampled in Québec, Canada, the late‐maturing component (MSW) of the population of a given sex exhibited higher proportions of SNP genotypes 54Thrvgll3 and 323Lysvgll3 compared to early‐maturing fish (1SW), for example, 85% versus 53% of females from Trinité River carried 323Lysvgll3 (n MSW = 205 vs. n 1SW = 30; p < .001). However, the association between vgll3 polymorphism and sea age was more pronounced in females than in males in the rivers we studied. Logistic regression analysis of vgll3 SNP genotypes revealed increased probabilities of exhibiting higher sea age for 54Thrvgll3 and 323Lysvgll3 genotypes compared to alternative genotypes, depending on population and sex. Moreover, individuals carrying the heterozygous vgll3 SNP genotypes were more likely (>66%) to be female. In summary, two nonsynonymous vgll3 polymorphisms were confirmed in North American populations of Atlantic Salmon and our results suggest that variation at those loci correlates with sea age and sex. Our results also suggest that this correlation varies among populations. Future work would benefit from a more balanced sampling and from adding data on juvenile riverine life stages to contrast our data.

time, we investigated the presence of two nonsynonymous vgll3 polymorphisms in North American Atlantic Salmon populations that relate to sea age in European salmon and quantified the natural variation at these and two additional candidate SNPs from two other genes. A targeted resequencing assay was developed and 1,505 returning adult individuals of size-inferred sea age and sex from four populations were genotyped. Across three of four populations sampled in Québec, Canada, the late-maturing component (MSW) of the population of a given sex exhibited higher proportions of SNP genotypes 54Thr vgll3 and 323Lys vgll3 compared to early-maturing fish (1SW), for example, 85% versus 53% of females from Trinité River carried 323Lys vgll3 (n MSW = 205 vs. n 1SW = 30; p < .001). However, the association between vgll3 polymorphism and sea age was more pronounced in females than in males in the rivers we studied.
Logistic regression analysis of vgll3 SNP genotypes revealed increased probabilities of exhibiting higher sea age for 54Thr vgll3 and 323Lys vgll3 genotypes compared to alternative genotypes, depending on population and sex. Moreover, individuals carrying the heterozygous vgll3 SNP genotypes were more likely (>66%) to be female. In summary, two nonsynonymous vgll3 polymorphisms were confirmed in North American populations of Atlantic Salmon and our results suggest that variation at those loci correlates with sea age and sex. Our results also suggest that this correlation varies among populations. Future work would benefit from a more balanced sampling and from adding data on juvenile riverine life stages to contrast our data.

K E Y W O R D S
dual-index barcoding, genes of large effect, Illumina ® MiSeq, life history, paired-end amplicon sequencing, single nucleotide polymorphisms

| INTRODUCTION
One of the largest contemporary challenges is to balance the availability of biological resources with an increasing demand for exploitation through a consistently growing world population (Ludwig, Hilborn, & Walters, 1993). Forestry, hunting, and fisheries are classical ways to exploit biological resources. However, those resources are naturally limited in their abundance and many species are vulnerable to overexploitation and disturbance. The issue of exploitation and habitat disturbance of fisheries resources increased dramatically over the last decades (Christensen et al., 2003;Limburg & Waldman, 2009;Pauly et al., 2002). Harvesting fish from natural populations reduces genetic, morphological, and life-history diversity (Allendorf, England, Luikart, Ritchie, & Ryman, 2008;Conover & Munch, 2002;Kuparinen & Merilä, 2007). Consequently, a thorough understanding of key biological traits in exploited fish populations, that is, major traits characteristic of an individual's fitness, life history, or age, can be crucial for the persistence and management of this valuable aquatic resource.
Fish stock biodiversity has been traditionally monitored by assessing general abundance, size at age, morphological and life-history variation, etc. (Hilborn & Walters, 1992;Pauly et al., 2002). In addition to phenotypic approaches, putatively neutral genetic markers serve to delineate population structure and diversification in numerous cases (Cuéllar-Pinzón, Presa, Hawkins, & Pita, 2016;Hauser & Seeb, 2008). Genomic approaches are currently revolutionizing our understanding of population structuring and diversification processes. Next-generation sequencing technologies increasingly contribute to resolve fundamental and applied aspects in fisheries and aquaculture. For example, SNP-based approaches (single nucleotide polymorphisms) allow researchers to identify genes and genomic regions associated with major phenotypic traits, to discriminate species and fish stocks, or to trace fish and fisheries products along the supply chain (Bernatchez, 2016;Cuéllar-Pinzón et al., 2016;Elmer, 2016;Hemmer-Hansen, Therkildsen, & Pujolar, 2014;Valenzuela-Quiñonez, 2016). Molecular genetic approaches open up unprecedented avenues to manage and to restore natural variation in exploited species, in particular if genetic markers associate with key biological traits (Allendorf et al., 2008).
The Atlantic Salmon (Salmo salar) is an exploited migratory fish species of immense economic and cultural importance for the nations abutting the North Atlantic. However, throughout their native range in the Northern Hemisphere, the species is in serious peril. The International Council for the Exploration of the Sea has documented a dramatic decline of Atlantic Salmon over the last decades (Chaput, 2012;ICES 2013). Various factors likely contributed to this reduction in overall abundance, including numerous anthropogenic impacts (Limburg & Waldman, 2009;Otero et al., 2011) such as excessive overexploitation, not only through offshore fisheries at the open sea, but also in riverside areas, where the fish return from their sea migration to spawn. For instance, since the 1970s, the marine mortality of Atlantic Salmon in European populations has increased from 70% to over 90% in 2005 (Friedland et al., 2009). The WWF estimates that of 2,005 traditionally salmon-bearing river populations, more than 40% are threatened and 15% have gone extinct already (World Wildlife Fund 2001). Many of the remaining populations exhibit substantially reduced genetic and phenotypic diversity (King et al., 2007). Previous work documented the evolutionary consequences of extensive sizeselective harvesting that frequently manifests in alteration of certain life-history traits in the population such as reduced growth rates, overall smaller body size, and an earlier onset of sexual maturation (Jorgensen et al., 2007;Kuparinen & Merilä, 2007;Quinn, McGinnity, & Cross, 2006). For example, in many wild populations, the age at which individuals return from their marine migration to spawn decreased significantly in the last decades (Friedland et al., 2009;Otero et al., 2011). The life cycle starts with spawning in freshwater rivers.
Various juvenile phases (alevin → fry → parr → smolt) can be found in riverine nursery grounds. After several years in the riverine habitat, the smolts finally migrate to their feeding grounds at sea. Atlantic Salmon spend either one winter (1SW) or multiple winters (MSW; up to five recorded) at sea before they mature and return to their river of origin to spawn in order to complete their life cycle. Repeated spawning migrations in subsequent years (iteroparity) occur at low frequencies of ca. 11% but vary among populations (Fleming, 1998). The number of winters at sea before maturation is termed sea age at maturity, hereinafter "sea age." Sea age has been considered as crucial for sustainable population persistence over multiple generations in another salmonid, the Sockeye Salmon (Oncorhynchus nerka) (Schindler et al., 2010), and a broad distribution of sea age in a population is a reliable predictor of genetic diversity in Atlantic Salmon (Vähä, Erkinaro, Niemelä, & Primmer, 2007). Decreasing sea age also reduces the value of populations as it is the larger multi-sea winter fish that anglers are attracted to. In summary, sea age defines population diversity and structuring to a large extent and has a great potential to serve as reference trait for biodiversity assessments and for the management of exploited Atlantic Salmon populations.
Recently, variation in sea age in Atlantic Salmon has been attributed to a major selective sweep containing the candidate locus (vgll3, vestigial-like family member 3, a 4-kb gene located on chromosome 25 consisting of 4 exons of 460, 440, 570, and 600 bp and no splicing variants) in European populations (Ayllon et al., 2015;Barson et al., 2015). A combined approach of a SNP-based genomewide association study on 1,404 individuals from 57 populations and wholegenome resequencing of 32 individuals revealed that vgll3 explained 39% of the variation in sea age (Barson et al., 2015). Moreover, an independently conducted study on both domesticated and wild Atlantic Salmon from Western Norway narrowed down the candidate region to a 2.4-kb stretch within the vgll3 gene (Ayllon et al., 2015).
However, the causative SNPs or other possible causal variants still remain to be elucidated.
The two alleles at the vgll3 locus are associated either with early (E) maturation and thus low sea age, or with late (L) maturation and higher sea age (Ayllon et al., 2015;Barson et al., 2015). Moreover, sex-dependent dominance of E and L alleles was documented (Barson et al., 2015). The E allele promoting earlier maturation is most abundant in males (EE most frequent, EL moderate, LL most rare), whereas in females the L allele promoting delayed maturation seems to be selected for (EE most rare, EL moderate, LL most frequent) (Barson et al., 2015). Interestingly, vgll3 is also associated with the onset of puberty and adiposity in humans (Cousminer et al., 2013), suggesting that this gene might be functionally highly conserved across vertebrates.
Because late-maturing fish with larger body mass (higher sea age) exhibit higher reproductive success and females mature later than males on average (Fleming & Einum, 2010), sea age is a trait of great interest to understand sexual conflict, that is, sex-differential selection on a trait with a common genetic basis.
Two strongly linked nonsynonymous SNPs were independently identified within vgll3 (Met54Thr vgll3 and Asn323Lys vgll3 ) (Ayllon et al., 2015;Barson et al., 2015) and were predicted to alter protein function and structure (Barson et al., 2015). 1SW males predominantly exhibited methionine and asparagine at amino acid positions #54 and #323 in the vgll3 protein, respectively, whereas 3SW males more likely exhibited threonine and lysine at those positions and Met54Thr vgll3 and Asn323Lys vgll3 explained 33% and 36% of sea age (Ayllon et al., 2015). Besides vgll3, two additional, but less supported candidate genes for sea age were identified: akap11 (a-kinase anchor protein 11) (Ayllon et al., 2015;Barson et al., 2015) and six6 (Barson et al., 2015). Akap11 is located on the same chromosome (chm25) as vgll3 and within the main selective sweep associated with sea age (Ayllon et al., 2015;Barson et al., 2015), whereas six6 is situated in a region on chromosome 9 and in strong linkage disequilibrium with multiple other genes (Barson et al., 2015). In humans, the expression of akap11 correlates with spermatogenesis (Vijayaraghavan et al., 1999), whereas six6, a transcription factor involved in the hypothalamic-pituitary-ovarian axis, is associated with pubertal height, growth, and age at maturity (Perry et al., 2014). A nonsynonymous mutation was characterized in akap11 which was predicted to change protein structure (Val214Met akap11 ) ( Barson et al., 2015). 1SW fish exhibited predominantly the Val214 variant of akap11, whereas the MSW fish carried the Met214 variant (Ayllon et al., 2015). An additional SNP marker (termed SIX6 TOP (Barson et al., 2015) due to its proximity to the six6 candidate gene) was associated with sea age in the pooled sample of 57 populations, but not after correction for population stratification. SIX6 TOP explained 5% and 3% of the size variation within sea age categories in females and males (Barson et al., 2015).
In this study, we assessed for the first time in North American Atlantic Salmon populations the natural variation at the previously identified loci associated with sea age. The nonsynonymous single nucleotide variants in candidate genes vgll3 and akap11 were of particular interest because of their potential to alter protein function and structure and thus of their putative biological significance for sea age.
We developed targeted resequencing assays for the nonsynonymous mutations Met54Thr vgll3 , Asn323Lys vgll3 , and Val214Met akap11 , as well as for SIX6 TOP, and related genetic and phenotypic variation at the individual level in four Canadian rivers that varied in their 1SW to MSW ratios. The rationale was to find out whether these candidate SNPs existed in North American populations and how they associate with sea age. in the framework of annual monitoring censuses and previous studies (Milot, Perrier, Papillon, Dodson, & Bernatchez, 2013;Richard, Dionne, Wang, & Bernatchez, 2013). These rivers harbor different average levels of sea age at maturity. Malbaie and Escoumins rivers have relatively lower proportions of 1SW fish (23% and 38% of the returning adult individuals), whereas on average 63% and 90% 1SW fish are found in rivers Trinité and Vieux-Fort over five recent consecutive years, respectively (Cauchon, 2015). Fish were caught with seine nets, and fork length was measured to the nearest cm in the field. A small piece of fin tissue was conserved in 97% ethanol for later DNA extraction in the laboratory.

| Sample collection and selection of data set
1SW and MSW assignment was based on fork length, the distance between the tip of the snout to the most distal point of the caudal fin rays (Fig. S1). A fork length of 63 cm is an established empirical threshold to discriminate between 1SW and MSW fish in Québec (MFFP 2016). Individuals measuring 63 ± 5 cm were excluded from the study to further reduce ambiguity in the assignments, that is, individuals <58 cm were assigned to the 1SW group and those >68 cm were assigned to the MSW group. Sample sizes were limited to an n of 50 for 1SW fish per sampling year and river to an n = 100 for MSW fish (Table S1), such that in total 1,505 individuals (653 1SW A total of 403 fish were determined as 1SW and 397 fish were determined as 2SW. All 397 2SW fish (scales) were also classified by fork length as MSW, whereas 394 of the 403 1SW fish (scales) were also classified by fork length as 1SW. Multiple spawners, as inferred from scale readings, were entirely excluded, but it is still conceivable that a small number of multiple spawners and >3SW fish remain in the MSW component that had no available scale reads.

| DNA extraction and normalization
DNA was extracted from fin clips using a salt extraction method (Aljanabi & Martinez, 1997

| Sex determination
All individuals were sexed using polymerase chain reaction (PCR) (Quéméré et al., 2014;Yano et al., 2013). The male-specific sdY gene was targeted, and therefore during agarose gel electrophoresis, only male individuals and the positive control showed a fluorescent band.
Subsequently, the cleaned paired-end reads were merged using the program FLASH (Magoč & Salzberg, 2011) with minimum overlap of 50 bp and maximum overlap of 250 bp. Paired reads were then aligned to the amplicon sequences ±50 bp (VGLL3_323 ± 50 bp, VGLL3_54 ± 50 bp, AKAP11 ± 50 bp, SIX6 ± 50 bp) that were used for the primer design. Sequence reads were mapped to this reference sequence using the bwa mem algorithm from the Burrows-Wheeler Aligner software package (Li, 2013). The program samtools (Li et al., 2009) was used to convert and sort the sequence alignment files (SAM format) into binary alignment files (BAM format), and to index the BAM files. SNP variants were then called using the samtools mpileup command with parameters -uD, -E, and -f (Li et al., 2009

| Correlation of SNP variation with sea age
Although more than 1,500 individuals were investigated, the genotype sample sizes for some groups (notably 1SW females and MSW males in some rivers) were small, and for some groups, no data were available (Table 1). This low representation of 1SW females and MSW males in some rivers reflects the demographic reality of those populations (Cauchon, 2015). To infer systematic patterns of SNP genotype variation associated with sea age and sex, we calculated proportions of individuals carrying the common SNP genotype in relation to alternative genotypes. This in turn facilitated across population comparisons.
To infer whether the differences in the genotype distributions between 1SW and MSW fish in each sex and population were representative and not artifacts of unequal sample sizes (Table 1), we conducted chi-square statistics with and without multiple test correction. Chi-square statistics were also conducted to test for statistically significant differences in the genotype distribution between sexes, that is, within a sea age class.
A complementary binary logistic regression considering the factors sex, year, and river as explanatory variables targeted the predictability of sea age from SNP genotypes. Logistic regression is a special case of generalized linear mixed-effects model with a binomial error distribution (1 = "MSW," 0 = "1SW"). A model selection process based on the Akaike information criterion (AIC) was first conducted to identify the model that best fitted our data (Burnham & Anderson, 2004) ( Table S3). The glmer function package "lme4" (Bates, 2005) was used to fit the models. The best model was then fully explored, and the estimated posterior distribution of the model parameters was assessed using the sim function (package "arm" (Gelman & Hill, 2007) (Ayllon et al., 2015;Barson et al., 2015).

| Association of genotype proportions with sea age
54Thr vgll3 ("CC") and 323Lys vgll3 ("GG" were most abundant in all populations and more than 75% of the individuals carried these genotypes. Genotype proportions were calculated for each sex and sea age category to facilitate comparisons within and across populations. We found that in three of four rivers, the MSW population T A B L E 1 Genetic variation at the four candidate loci indicated by the number of individuals bearing a particular genotype. E = Early, L = Late. component exhibited higher proportions of variants 54Thr vgll3 ("CC"), hereinafter 54Thr vgll3 ("LL"), and 323Lys vgll3 ("GG"), hereinafter 323Lys vgll3 ("LL"), compared to 1SW groups, with females exhibiting more pronounced differences than males ( Figure 2, Table 2).
For example, in the Escoumins River, only 17% of 1SW females exhibited 54Thr vgll3 ("LL"), whereas 62% of the MSW females carried this genotype. A similar trend can be observed in the populations Malbaie (33% in 1SW females vs. 61% in MSW females) and Trinité (57% in 1SW females vs. 85% in MSW females) (Figure 2). However, the Vieux-Fort River population with a very high proportion of ca.
90% 1SW fish in both sexes (Cauchon, 2015) did not conform to the pattern found in the other three populations (Figure 2). We did not detect any systematic pattern in the distribution of the six6 SNPs (Table 2).
The difference in the vgll3 genotype distributions between 1SW and MSW fish in females from the Trinité River was statistically highly significant at p < .001 for Asn323Lys vgll3 and p < .01 for Met54Thr vgll3 (Table 2), thus supporting the hypothesis of genotype-phenotype associations related to vgll3. We also confirmed statistically significant differences in the vgll3 genotype distributions between males and females in 1SW fish from Trinité River at p < .001 for Asn323Lys vgll3 and p < .01 for Met54Thr vgll3 (Table 2). Although three of four populations exhibited a similar trend in the distribution of vgll3 genotypes (Figure 2), all other comparisons resulted at most in marginally nonsignificant differences (.05 < p < .1) in the genotype distributions of the 1SW or MSW group or between the sexes of a given sea age class (   possibly partly due to unbalanced sample sizes (Table 1). We did not detect any significant differences in the distributions of the six6 genotypes between 1SW/MSW or between sexes.

| Complementary analysis: Predicting sea age from SNP genotypes
The logistic regression model with the three-way interaction of SNP genotypes, sex, and population (river) was identified as the best fitting model (Table S3). Although the late-maturing vgll3 genotypes 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") were most abundant in all four populations (Table 1), the logistic regression analysis of vgll3 SNP genotypes revealed increased probabilities of belonging to the MSW group for genotypes 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") compared to alternative genotypes ("EL," "EE"), depending on population and sex (Table S4).

| Variation at a major sea age locus in North American Atlantic Salmon
In our data set of 1,505 Atlantic Salmon from four North American populations, we confirmed the presence of two nonsynonymous mutations within the vgll3 gene that have been documented in European populations. Moreover, these SNPs seem to associate with age at maturity to a certain extent that is the association varied with sex and population in our study system. In one population (Trinité), the SNP genotypes 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") occur in sig- Interestingly, the late-maturing vgll3 genotypes 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") were most abundant in all four populations (between 65% and 92%; Table 1). Moreover, compared to the European study (Barson et al., 2015), the overall prevalence of 1SW females was very low and the fact that >3SW fish are generally rare in North American populations (Cauchon, 2015) implies that the MSW components of our populations likely contained no or very few >3SW fish. This may suggest that the antagonistic selection pressures between European and North American populations could differ. Furthermore, our sampling did not allow to test for the sex-dependent prevalence of alternative homozygous genotypes in younger and older >3SW fish that supposedly contributes to the resolution of sexual conflict in the species (Barson et al., 2015).
Therefore, future studies targeting North American vgll3 variation should ideally include populations with both high proportions of 1SW females and >3SW individuals of both sexes. These samples, however, are quite rare in the rivers studied here (Cauchon, 2015) and were not available for our study. Future studies would also benefit from comparisons of vgll3 genotype distributions before and after the marine migration in order to test for differential mortality associated with vgll3 genotypes.

| Parallelism and population-specific effects
Although the extent of genetic divergence and life-history traits varied among the populations we studied, we observed a similar distribution of vgll3 genotypes across sea age classes and sexes in three of four investigated populations and differences between sea age classes in vgll3 genotype distributions were stronger in females than in males.
However, the fourth population from the Vieux-Fort River clearly stands out. The Vieux-Fort River has a very untypical life-history structure compared to other populations from Québec with more than 90% 1SW fish in both sexes (Cauchon, 2015). These characteristic differences in the population structure might reflect substantially varying selection pressures across rivers. It is also conceivable that a large proportion of the variation in sea age is brought about by other, so far undetermined genes or environmental factors that contribute to the prevalence of the 1SW trait.

| SNP genotypes and sex
Genotype frequencies varied between sexes, that is, heterozygous individuals at both investigated nonsynonymous vgll3-SNPs had an approximately 2:1 probability of being female and not male. The 2:1 probability of being female when heterozygous for vgll3 possibly reflects the influence of the late-maturing allele on delaying sea age (Barson et al., 2015) compared to an individual being homozygous for the early allele. The result can also be interpreted in the context of the substantially different reproductive strategies across sexes with the females maturing later and at larger sizes on average relative to males.
Thus, carrying a late-maturing allele could hypothetically be more necessary for female survival and reproductive success than it is for males. A direct link of vgll3 with sex determination in Atlantic Salmon could be an alternative explanation which cannot be addressed in this study. Clearly, this deserves further investigation in North American populations.

| Beyond the species border-transferability to other salmonids?
Met54Thr vgll3 and Asn323Lys vgll3 are also present across Brown Trout (Salmo trutta), Rainbow Trout (Oncorhynchus mykiss), and Arctic Char (Salvelinus alpinus) (Ayllon et al., 2015), thus lending support to the idea that vgll3 influences sea age in multiple species. Ayllon et al. (2015) reported that 3SW individuals of these species all possess the amino acid variants that associate with later maturation in our sample, that is, 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") (Ayllon et al., 2015). An allelic discrimination assay on a small sample of five individuals of a landlocked Swedish Atlantic Salmon population further confirmed the 3SW variants of vgll3 for all five individuals, and it was concluded that European Atlantic Salmon 3SW variants (54Thr vgll3 and 323Lys vgll3 ) are ancestral whereas the 1SW variants of 54Met vgll3 and 323Asn vgll3 ) are derived (Ayllon et al., 2015). While the presence of Met54Thr vgll3 and Asn323Lys vgll3 is unchallenged, it is noteworthy that in our data set, between 65% and 92% of the individuals within a population carried the "ancestral" 54Thr vgll3 ("LL") and 323Lys vgll3 ("LL") variants (Table 1).
In light of the low prevalence of late-maturing fish in North American Atlantic Salmon populations this suggests that different selection pressures might be at work at either side of the Atlantic Ocean.