Genomic regions and signaling pathways associated with indicator traits for feed efficiency in juvenile Atlantic salmon (Salmo salar)

One objective of this study was to identify putative quantitative trait loci (QTL) that affect indicator phenotypes for growth, nitrogen, and carbon metabolism in muscle, liver, and adipose tissue, and for feed efficiency. Another objective was to perform an RNAseq analysis (184 fish from all families), to identify genes that are associated with carbon and nitrogen metabolism in the liver. The material consisted of a family experiment that was performed in freshwater and included 2281 individuals from 23 full-sib families. During the 12-day feed conversion test, families were randomly allocated to family tanks (50 fish per tank and 2 tanks per family) and fed a fishmeal-based diet labeled with the stable isotopes 15N and 13C at inclusion levels of 2 and 1%, respectively. Using a linear mixed-model algorithm, a QTL for pre-smolt growth was identified on chromosome 9 and a QTL for carbon metabolism in the liver was identified on chromosome 12 that was closely related to feed conversion ratio on a tank level. For the indicators of feed efficiency traits that were derived from the stable isotope ratios (15N and 13C) of muscle tissue and growth, no convincing QTL was detected, which suggests that these traits are polygenic. The transcriptomic analysis showed that high carbon and nitrogen metabolism was associated with individuals that convert protein from the feed more efficiently, primarily due to higher expression of the proteasome, lipid, and carbon metabolic pathways in liver. In addition, we identified seven transcription factors that were associated with carbon and nitrogen metabolism and located in the identified QTL regions. Analyses revealed one QTL associated with pre-smolt growth and one QTL for carbon metabolism in the liver. Both of these traits are associated with feed efficiency. However, more accurate mapping of the putative QTL will require a more diverse family material. In this experiment, fish that have a high carbon and nitrogen metabolism in the liver converted protein from the feed more efficiently, potentially because of a higher expression of the proteasome, lipid, and carbon metabolic pathways in liver. Within the QTL regions, we detected seven transcription factors that were associated with carbon and nitrogen metabolism.

an improvement in feed efficiency has considerable value. Feed efficiency can be defined as the ratio of output to input (feed efficiency ratio, FER), i.e. as the ratio of growth to the unit of feed consumed, whereas feed conversion ratio (FCR) is the inverse of this ratio. In fish, genetic variance exists for fecal losses, due to individual variation in digestibility [3]. Genetic variance exists also for maintenance (degradation and replacement of previously deposited protein), and for growth, if the composition of growth is ignored, i.e. for protein metabolism [4]. The relative contribution of nutrients to protein metabolism can be assessed by using feed that is enriched with certain isotopes (i.e., with altered ratios of 13 C/ 12 C and/or 15 N/ 14 N) and monitoring the subsequent rate of change in the isotope profile of different tissues [5][6][7]. Nitrogen and carbon isotopes are the most relevant when assessing feed efficiency because all organic compounds contain carbon, while nitrogen is common to all amino acids. Molecules that contain 14 N versus 15 N differ in mass, and the ratio of these isotopes can be detected with an element analysis isotope ratio mass spectrometry system [8]. The genetic components of nitrogen and carbon metabolism in salmon were elucidated in an earlier study by Dvergedal et al. [4] by measuring the rate of change in the isotope profiles in different tissues. The isotope profiles were used to calculate indicators of FER/FCR (isotope-based FER (IFER)/ FCR (IFCR)). In addition, they (r g ) of tank-FCR with indicator traits based on nitrogen and carbon metabolism in muscle tissue measured by using stable isotopes ( 15 N and 13 C) (r g ~ 1.0), and with carbon metabolism in liver (ALC) (r g ~ 0.9). These results are in accordance with those reported by Hawkins et al. [9], who proposed that differences in protein metabolism between individuals are genotype-dependent. Efficient fish are characterized by high protein growth and reduced protein degradation in muscle at the same relative growth rates [10]. In addition, Dvergedal et al. [4] showed that growth, isotope-based indicator traits, and sampling day jointly explained 73% of the observed variance in tank-FCR records, compared to 53-63% by growth and sampling day alone [4]. Hence, including nitrogen and carbon metabolism traits for relevant tissues substantially improved the prediction of FCR. Establishing the genetic basis of individual differences in feed utilization may have major implications for selection in aquaculture breeding programs. Moreover, genetic improvement of feed efficiency, by selection for growth or by other means, will decrease production costs and the environmental footprint per unit produced [11,12].
To date, no genome-wide association study (GWAS) has reported quantitative trait loci (QTL) related to FER/ FCR in Atlantic salmon [13,14]. However, in beef cattle [15,16], chicken [17][18][19], pigs [20,21], and some other fish species [22,23], QTL have been detected primarily for feed conversion efficiency or residual feed intake. Because individual phenotypic records are difficult to obtain, it is not easy to assess FER/FCR in aquatic species. With indicator phenotypes for nitrogen and carbon metabolism, feed efficiency can now be evaluated at the individual level and used in GWAS [24] and for markerassisted selection [24,25].
This study is based on a large-scale family experiment in Atlantic salmon, where families were kept separately in replicate tanks, with individual recording of growth and isotope profiles after feeding on 15 N and 13 C-enriched feed. The objective was to identify QTL that affect relevant indicator phenotypes for FER/FCR: weight gain (WG), relative weight gain (RG), atom % 13 C in muscle (AMC), atom % 15 N in muscle (AMN), atom % 13 C in liver (ALC), atom % 15 N in liver (ALN), atom % 13 C in adipose tissue (AAC), the indicator trait of FCR for AMC (IFCR_AMC), the indicator trait of FCR for AMN (IFCR_ AMN), the indicator trait of FER for AMC (IFER_AMC), and the indicator trait of FER for AMN (IFER_AMN) [4]. Another objective was to use RNAseq analysis to identify genes the expression of which is associated with carbon and nitrogen metabolism in the liver.

Phenotypic data
Phenotypic data on Atlantic salmon were collected from a family experiment that was carried out at the fish laboratory of the Norwegian University of Life Sciences (NMBU) in Aas, Norway. Details on this experiment are in Dvergedal et al. [4]. Broodstock from AquaGen's breeding population (22 males and 23 females) were used to generate 23 families. To ensure clearly contrasted family groups with respect to growth potential and feed efficiency, divergent parents were selected based on high and low estimated breeding values (EBV) for growth in seawater.
Prior to the start of feeding, multiple families were kept in separate compartments within the same tank, with five tanks required to house all the families. Based on parentage assignment, 100 family members were identified for each of the 23 families and reared together in a single tank from the start of feeding until the start of the feed conversion test. Prior to the 12-day test, families were allocated to tanks, 50 fish per tank and two tanks per family (except for nine tanks in which the number of fish varied from 42 to 54 due to mortality prior to the start of the experiment and a counting mistake), for a total of 2281 fish. Families were fed a fishmeal-based diet labeled with the stable isotopes 15 N and 13 C at inclusion levels of 2 and 1%, respectively, as described by Dvergedal et al. [4]. Feed conversion rate was recorded at the family level.
Phenotypic data were recorded individually for WG, RG, AMC, AMN, ALC, ALN, and AAC, as described by Dvergedal et al. [4], resulting in phenotypes for 2249 to 2280 fish per trait. Muscle, liver, and adipose samples from each individual were collected in cryotubes and snap-frozen in liquid nitrogen for stable isotope analysis. The sampling procedure and determination of atom % 15 N and 13 C in the samples are explained in detail in Dvergedal et al. [4]. The stable isotope analysis was carried out at the Institute for Energy Technology (Kjeller, Norway).
From the individual ( i ) phenotypes of AMC and AMN, individual isotope-based indicator traits for FCR and FER (IFCR and IFER, respectively), i.e. IFCR_AMC i , IFCR_AMN i , IFER_AMC i , and IFER_AMN i , were derived as follows (taking 15 N as an example): where IW i and FW i is the initial and final weight of fish i , APE Ni = (AMN i − IA%) , with IA % equal to 0.370% for 15 N and 1.087% for 13 C. After a diet switch, the atom percentage in excess (APE) of a stable isotope in muscle tissue is expected to be proportional to the fraction of newly synthesized nutrients in the muscle, and the product of APE and final weight is expected to be proportional to the mass of new nutrients in the body tissue. Because IFCR is expected to be proportional to the amount of newly deposited body nutrients per g increase in body weight, fish that exchange a larger fraction of the body mass per unit of growth will be less feed-efficient. Exchange of body tissue is traceable with stable-isotope profiling and is related to the feed intake of the individual, the denominator of the ratio is weight gain, and the ratio between these two variables equals IFCR or the inverse equals IFER.

Genotypic data
When the fish reached 5 to 10 g, they were pit-tagged with a 2 × 12 mm unique glass tag (RFID Solutions, Hafrsfjord, Norway), and a fin-clip was collected from 2300 fish for DNA-extraction and genotyping. Fin clips (20 mg) were incubated in lysis buffer and treated with proteinase K (20 µg/ml) at 56 ℃ overnight. The following day, DNA was isolated from the lysate at Biobank AS (Hamar, Norway) using the Sbeadex livestock kit (LGC Genomics) according to the manufacturer's protocol (Thermo Fisher Scientific). DNA concentration was measured using a Nanodrop 8000 (Thermo Fisher Scientific). All fish were genotyped using AquaGen's custom Axiom ® SNP (single-nucleotide polymorphism) genotyping array from Thermo Fisher Scientific (former Affymetrix) (San Diego, CA, USA). This SNP-chip contains 56,177 SNPs that were originally identified based on Illumina HiSeq reads (10-15 × coverage) from 29 individuals from AquaGen's breeding population. Genotyping was done at CIGENE (Aas, Norway). Genotypes were called from the raw data using the Axiom Power Tools software from Affymetrix. Individuals with a Dish-QC score lower than 0.82, and/or a callrate lower than 0.97, and/or more than 10% missing genotypes were removed from further analyses. Also, SNPs with a minor allele frequency (MAF) lower than 1%, and with a missing call rate greater than 10% were removed. After filtering, 54,200 SNPs were included in the analysis.

Association analysis
Associations between each SNP and the phenotypes related to nitrogen and carbon metabolism (AMC, AMN, ALC, ALN, and AAC), growth (WG, RG), and indicator traits of feed efficiency (IFCR_AMC, IFCR_AMN, IFER_AMC, and IFER_AMN) were tested by using a linear mixed-model algorithm implemented in the GCTA software [26]. The leave-one-chromosome-out option (-mlm-loco) was used, in which the chromosome that contains the tested SNP was left out when building the genetic relationship matrix (GRM). The linear mixed model used can be written: where Y i is the phenotypes of individual i for one of the evaluated traits, a is the intercept, b is the fixed regression for the SNP to be tested for association, x is the SNP genotype indicator variable, coded as 0, 1 or 2, g − i is the random polygenic effect for individual i , assumed to be distributed as ~ N (0, Gσ 2 g ), where G is the genomic relationship matrix, computed using the genotyped SNPs on all chromosomes except on the chromosome on which the tested SNP is located, σ 2 g is the variance of the polygenic effect, and ε i is a random residual. In this algorithm, σ 2 g is re-estimated each time a chromosome is left out from the calculation of the GRM. The level of significance was evaluated using a built-in likelihoodratio test. The threshold value for 5% genome-wide significance was derived using the Bonferroni correction as 0.05/54,200 = 9.23 × 10 -7 , corresponding to a −log10 p-value (p) of 6.03. The number of SNPs on each chromosome was used to calculate 5% chromosome-wide significance levels. The Bonferroni correction is known to be overly conservative especially when applied to correlated SNP data, i.e., to SNPs that are in linkage disequilibrium, which can produce an excess of false-negative results [27]. Manhattan plots were used to visualize the −log10 (p) of SNPs across the chromosomes (n = 29) (Figs. 1 and 2) and QQ-plots were used to visualize the distribution of observed versus expected genome-wide −log10 (p) (Figs. 3 and 4).

RNA extraction and sequencing
RNA was extracted from the liver of 184 fish, at ~ 10 months of age (four fish per tank), using the RNeasy Plus Universal Kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol. The RNA concentration was determined using a Nanodrop 8000 (Thermo Fisher Scientific, Waltham, USA), and RNA quality was determined using a 2100 Bioanalyzer with the RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, USA). All samples had an RNA integrity number greater than 8. Sequencing libraries were constructed using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, USA) according to the manufacturer's instructions. Sequencing was performed on an Illumina Hiseq 2500 at the Norwegian Sequencing Center (Oslo, Norway), using 100 bp single-end sequencing. All raw fastq-files have been deposited on ArrayExpress under accession number E-MTAB-8305. Reads were trimmed, aligned to the salmon genome (ICSASG_v2), and counted using the bcbio-nextgen pipeline (https :// githu b.com/bcbio /bcbio -nextg en) and the NCBI salmon genome annotation (https ://salmo base.org/Downl oads/ Salmo _salar -annot ation .gff3).

RNAseq analysis
The EdgeR software [28] was used to identify genes that had an expression in the liver associated with the nitrogen and carbon metabolism liver traits (ALN and ALC), using the following linear regression model: where Y Fij is the expression of gene i for fish j in family F , µ F is the intercept for each family, b 1 is the fixed regression coefficient of the phenotype for the trait ( x j = ALC or ALN of fish j ), and b 2 is the fixed regression coefficient of the ratio of weight gain (final weight ( FW )-initial weight ( IW )) to FW , to account for the effect of growth on gene expression, and ε Fij is a random residual. The trait phenotypes were scaled and centered (mean = 0 and SD = 1), such that the resulting regression slopes could be compared between traits. Genes with an expression that had significant regression coefficients on the trait at a false discovery rate (FDR) corrected p-value (q) < 0.05 were classified as trait-associated genes (TAG). TAG were analyzed for over-representation in KEGG pathways using the "kegga" function in the limma R-package [29]. Genes that encoded transcription factor proteins were identified by using the R-package SalMotifDB (https ://salmo base.org/apps/SalMo tifDB ) [30], which interacts with a database of transcription factors for salmonids.

Results and discussion
Efficient fish minimize the loss of deposited nutrients per unit growth, which is expected to affect the rate of change in the observed isotope profile. Using isotope data, individual phenotyping of feed efficiency is possible in Atlantic salmon even without obtaining registrations of individual feed intake. Estimates of heritability, genetic and phenotypic correlations among the studied traits and FCR were previously reported by Dvergedal et al. [4] and showed that the indicator feed efficiency traits IFCR/ IFER in muscle had estimates of the genetic correlation with FCR on a tank level that were at the boundary of the parameter space (r g ~ 1.0). However, ALC also showed a high genetic correlation with FCR on a tank level (r g = − 0.90) but was less genetically correlated with growthrelated traits than with IFCR/IFER on an individual level. Consequently, ALC may explain individual variation in feed efficiency that is not related to growth.

Association analysis
To test whether phenotypes for feed efficiency such as ALC and IFCR/IFER variables are associated with SNPs, we performed a GWAS with a linear mixed-model algorithm, using indicator traits related to nitrogen and carbon metabolism, growth, and indicator traits for feed efficiency as phenotypes. To our knowledge, this is the first GWAS applied to indicator traits of feed efficiency and metabolism in muscle, liver, and adipose tissues of Atlantic salmon. Figure 1 illustrates significant associations between SNPs and traits of interest and significant associations with a −log10 (p) > 8 are in Table 1. The Manhattan plots for WG, RG, AMC, AMN, and ALN ( Fig. 1a-f ), showed two genomic regions that displayed a significant association on salmon chromosome (Ssa) 9, one between 13 and 31 Mb and one between 45 and 106 Mb. Many of the significant SNPs were shared between traits (Table 1). On Ssa9, four to 10 significant SNPs were in common between the traits WG, RG, AMC, AMN, and ALN, which may be because all these traits are closely associated with growth. Also, two SNPs on Ssa3, four on Ssa5, and 12 on Ssa20 were associated with WG, all with a −log10 (p) > 6.03 (Fig. 1a), while one SNP on Ssa11, one Ssa12, and one Ssa15 associated with AMN (Fig. 1d), and one SNP on Ssa20 with ALN (Fig. 1f ). No significant SNP associations with AAC were found at the evaluated age (~ 10 months), which might be because lipid deposition is at its maximum beyond this age, during the grow-out phase in the sea (~ 1.5-4 kg). Therefore, we cannot rule out significant SNP associations with lipid deposition at a later lifestage in salmonids [31][32][33]. Genome-wide significant SNPs that were associated with growth-related traits, such as WG, RG, AMC, AMN, and ALN, were mainly located on Ssa09. Gutierrez et al. [34], who mapped QTL for body weight in Atlantic salmon at different stages of life, reported genomewide significant SNPs (QTL) on Ssa09 for individuals of the same age as in this study (~ 10 months). They also detected chromosome-wide significant SNPs on Ssa20, but did not find any common significant SNPs at different stages of life. Baranski et al. [35] argued that the large number of QTL for growth-related traits that act at different stages of life implies that bodyweight is a polygenic trait in Atlantic salmon. However, as growth of salmons occurs mostly during the saltwater phase, the commercial interest of a QTL for body weight in the freshwaterphase is most likely limited. Nineteen genome-wide SNPs (−log10 (p) > 8) were also found on a Ssa12 region that affects ALC (Fig. 1e). Dvergedal et al. [4] showed that ALC had low genetic correlations with growth (WG and RG) (r g = 0.16 and 0.12, respectively, with standard errors of 0.12 and 0.14), which suggest that ALC might be genetically independent of growth (WG and RG). From an economic point of view, a QTL that is involved in improving feed efficiency without being associated with growth would be highly relevant and would add valuable information that cannot be captured by recording the growth of the individuals only. However, the SNPs of interest were spread over a 40-Mb region (between 34 and 73 Mb) that contains 770 genes (NCBI search).
Only three genome-wide significant SNP associations were found for the indicator trait IFER_AMN, on Ssa06, Ssa23, and Ssa27 (Fig. 2d), and none for IFCR_AMC, IFCR_AMN, and IFER_AMC (Fig. 2a-c), which suggests that all IFCR and IFER variables are polygenic traits. In fact, Dvergedal et al.   low heritability estimates for these indicator traits of feed efficiency, but this study included few families, and thus if a putative QTL does exist in the population it might not be represented in the 23 families analyzed. This means that a larger number of families should be studied, which might increase the number of haplotypes represented in the data and increase the possibility to accurately pinpoint the position of a QTL. Moreover, with strong family structures, long stretches of the same haplotype that are identical-by-descent [21] are likely to occur, which can result in the wide peaks, and in reducing the probability of finding significant SNPs for the indicator traits IFCR/IFER in these data.
With few but large families, our dataset is appropriate to detect QTL but not to fine map them, since longer DNA segments surrounding the QTL may be significant. For the traits for which significant QTL were detected, the shape of the QQ-plots (Fig. 3a-f ) can be explained by the substantial linkage disequilibrium (LD) in the population as a result of the limited effective population size. For AAC (Fig. 3g) and the IFCR/IFER traits (Fig. 4a-d) for which no significant QTL were found, the shape of the QQ plots was as expected.

RNAseq analysis
To assess the effects of ALC and ALN on gene expression, we sequenced RNA from the liver of 184 fish, which represented a subset of all the families included in this experiment. Regression analysis using ALC as a covariate identified 799 TAG with positive associations of gene expression and ALC and 741 TAG with negative associations (FDR corrected p-value (q) < 0.05). For ALN, we identified 900 TAG with positive associations and 978 TAG with negative associations (q < 0.05). Three hundred and seventeen genes were shared among the TAG with positive associations between ALC and ALN and 281 TAG were shared among the TAG with negative associations (Fig. 5a).
KEGG enrichment analysis of the positive and negatively associated TAG revealed more enriched KEGG pathways (p < 0.05) among the positively associated TAG (59 for ALC and 88 for ALN) than among the negatively associated TAG (24 for ALC and 35 for ALN). All TAG and enriched KEGG pathways are in Additional file 1: Table S1 and Additional file 2: Table S2 respectively. "Proteasome" was the most enriched pathway for both ALC and ALN. Since the stable nitrogen and carbon isotopes ingested by the fish were mainly in the form of protein, higher proteasome levels likely accelerate the breakdown and incorporation of stable isotopes in tissues of the fish. Another highly enriched pathway for both traits was the "TCA cycle", which is central to carbon metabolism and converts carbon from proteins, lipids, and carbohydrates into acetyl-CoA for fatty acid and cholesterol biosynthesis and NADPH for a variety of cellular processes (Fig. 5b). Two amino acid metabolism pathways "Glutathione metabolism" and "Selenocompound metabolism" were positively associated with high ALC and ALN. Both glutathione and selenocysteine are key components of glutathione peroxidases, which represent a family of antioxidant enzymes involved in counteracting the negative effects of reactive oxygen species produced during oxidative phosphorylation [36]. In addition, we observed that the cholesterol biosynthesis pathways "Terpenoid backbone synthesis", "Steroid biosynthesis" and the fatty acid metabolism pathways "Fatty acid degradation" and "Fatty acid biosynthesis" were positively associated with ALC and ALN (Fig. 5b). Cholesterol and fatty acids are key components of cell membranes, and a higher activity of these pathways could contribute to a higher level of 13 C incorporation in tissues. The pathways that were negatively associated with ALC and ALN, included several signaling pathways and also the "Autophagy" and "Mitophagy" pathways ( Fig. 5b). Taken together, our results suggest that individuals with high rates of liver carbon and nitrogen metabolism (ALC and ALN) convert protein from the feed into body tissue more efficiently, primarily because of higher proteasome gene expression, but also because of elevated oxidative phosphorylation and lipid metabolism.
To identify potential genes that drive differences in gene expression that are associated with ALC and ALN, we cross-referenced our list of TAG within the quite large regions that were identified on Ssa9 and 12 and that harbored 2532 genes. Among these genes, 177 were in our list of TAG, of which seven are known transcription factors (TF) ( Table 2). Four and three of these TF had significantly (p < 0.05) negative associations with ALC and ALN, respectively, which agrees with our KEGG analysis that showed that negatively associated pathways were the most enriched in signaling pathways (Fig. 5b). Two of the TF belong to the FoxO family, which is involved in a range of physiological processes, including insulin signaling, autophagy, and proteasome regulation [37].

Conclusions
In this study, we identified two important QTL for performance of Atlantic salmon in freshwater, one for growth on Ssa9, and one for carbon metabolism in the liver on Ssa12. Carbon metabolism in the liver was closely related to FCR at the tank level. However, we were not able to accurately map the putative QTL. For the IFCR/IFER phenotypes that were derived from the ratios between the fraction of stable isotopes ( 15 N and 13 C) in muscle and growth, no convincing QTL were detected. Transcriptomic analysis revealed a positive association of ALC and ALN with the ability to convert protein from the feed into body tissue more efficiently, primarily through the increased expression of proteasome, lipid, and carbon metabolic pathways in the liver. In addition, we identified seven transcription factors that were associated with ALC or ALN and that were located in the QTL identified.