Deploying new generation sequencing for the study of flesh color depletion in Atlantic Salmon (Salmo salar)

The flesh pigmentation of farmed Atlantic salmon is formed by accumulation of carotenoids derived from commercial diets. In the salmon gastrointestinal system, the hindgut is considered critical in the processes of carotenoids uptake and metabolism. In Tasmania, flesh color depletion can noticeably affect farmed Atlantic salmon at different levels of severity following extremely hot summers. In this study, RNA sequencing (RNA-Seq) was performed to investigate the reduction in flesh pigmentation. Library preparation is a key step that significantly impacts the effectiveness of RNA sequencing (RNA-Seq) experiments. Besides the commonly used whole transcript RNA-Seq method, the 3’ mRNA-Seq method is being applied widely, owing to its reduced cost, enabling more repeats to be sequenced at the expense of lower resolution. Therefore, the output of the Illumina TruSeq kit (whole transcript RNA-Seq) and the Lexogen QuantSeq kit (3’ mRNA-Seq) was analyzed to identify genes in the Atlantic salmon hindgut that are differentially expressed (DEGs) between two flesh color phenotypes. In both methods, DEGs between the two color phenotypes were associated with metal ion transport, oxidation-reduction processes, and immune responses. We also found DEGs related to lipid metabolism in the QuantSeq method. In the TruSeq method, a missense mutation was detected in DEGs in different flesh color traits. The number of DEGs found in the TruSeq libraries was much higher than the QuantSeq; however, the trend of DEGs in both library methods was similar and validated by qPCR. Flesh coloration in Atlantic salmon is related to lipid metabolism in which apolipoproteins, serum albumin and fatty acid-binding protein genes are hypothesized to be linked to the absorption, transport and deposition of carotenoids. Our findings suggest that Grp could inhibit the feeding behavior of low color-banded fish, resulting in the dietary carotenoid shortage. Several SNPs in genes involving in carotenoid-binding cholesterol and oxidative stress were detected in both flesh color phenotypes. Regarding the choice of the library preparation method, the selection criteria depend on the research design and purpose. The 3’ mRNA-Seq method is ideal for targeted identification of highly expressed genes, while the whole RNA-Seq method is recommended for identification of unknown genes, enabling the identification of splice variants and trait-associated SNPs, as we have found for duox2 and duoxa1.


Background
The unique pink-red flesh color of salmon is considered to be a decisive factor for its marketabilty as it tends to be associated with freshness and quality. This makes flesh color a key criterion in the salmon farming industry [1]. In the wild the muscle pigmentation of some salmonids is the result of the accumulation of carotenoids, primarily astaxanthin (Ax), derived from the consumption of algae which are Ax main producers, and/or crustaceans which fed on the algae [2]. The muscle pigmentation of farmed salmon relies on a synthesized form of carotenoids included in commercial diets [2]. Due to its hydrophobicity, Ax is not solubilized in the aqueous environment of the fish gastrointestinal system [3]. Several studies suggest that the addition of dietary lipids tends to enhance the deposition of Ax in salmon flesh [4,5]. Changing the dietary fatty acid composition could influence the solubility of Ax in the lipid phase [3,6].
Several studies examined the effect of temperature on the growth and development of Atlantic salmon as well as other salmonids [7][8][9]. The optimal temperature range for Atlantic salmon lies between 12 and 18°C [7]. Atlantic salmon display behaviors that are adaptive to the daily fluctuations in environmental temperature and dissolved oxygen in farming sea cages. Individual fish tend to adjust their depth to stay at the optimal temperature of 12˚C to 18˚C and show a trend of avoiding low dissolved oxygen [10,11]. Other studies have shown that exposure to temperatures above the upper optimal temperature significantly affected growth performance by reducing feed intake [9,12,13]. Therefore, the elevated summer temperatures experienced by Atlantic salmon grown in Tasmania may affect the fish physiology by inducing a stress response [9,14]. This is evident at the molecular level by the presence of altered heat shock protein and antioxidant pathways found in salmon exposed to high water temperatures [15][16][17][18]. In recent years, global warming has been marked as a key factor with potential devastating impacts on the aquaculture industry, especially in cold-blooded fish [19]. This led to an increased focus on investigating the effect of elevated temperature on the flesh color of salmonids such as Atlantic salmon [18]. Recently, Grunenwald, Adams [20] were able to induce a depletion of color in Atlantic salmon muscle as a result of short term thermal stress. In Tasmania, farmed Atlantic salmon can be exposed to water temperatures of 20˚C and above for extended periods during summer, which can exacerbate flesh color depletion [21]. In 2015-16 Tasmania reported an unprecedented extreme hot summer, during which farmed salmon experienced water temperatures of over 20 o C, which negatively impacted on both performance and flesh colour (i.e. in some cases flesh colour was almost completely depleted) [20,21].
Over a decade ago, next-generation sequencing (NGS) has been applied in transcriptomic analysis [22,23] and has since evolved into a universally-applied tool in molecular biology. Using NGS technologies, RNAsequencing (RNA-Seq), can be aligned to the reference genome or de novo assembled, making it a widely used tool also for non-model organisms [24,25]. Not only does it enable characterization of the transcriptome's complexity, but RNA-Seq also enables quantification of gene expression level, calling alternative splicing variations and single nucleotide polymorphisms (SNPs) [24,26]. In recent years, the cost of NGS analysis decreased considerably, further contributing to its popularity [27]. To date, Illumina has been the most widely used sequencing platform. Using a fluorescence-based reading of multiple short nucleotide sequences, it enables unparalleled output, both in terms of sequencing depth and quality [28,29]. Although the sequencing unit price is steadily decreasing [24], the expense of library preparation is still considerable [27], and the Illumina protocol for sample library construction (TruSeq, which is considered as a gold standard RNA-seq protocol) is time and labor-intensive [24]. These factors limit the number of samples sequenced using this approach in any given study.
Several library preparation methods have been developed for Illumina RNA-seq, bringing about various choices for users [30]. Lexogen's QuantSeq 3' mRNA library preparation (QuantSeq) benefits include reduced price, time and labor in comparison with the standard RNA-Seq [31]. QuantSeq is a quick and straightforward protocol that creates a library of sequences close to the 3' end of polyadenylated RNAs from only 0.5-500 ng of total RNA input [31]. In addition, there is no need to fragment the mRNA prior to cDNA synthesis [32]. Due to producing only one fragment per transcript, the number of reads aligned to a given gene is the percentage of its expression, regardless of the transcript length [31]. Hence, QuantSeq requires lower sequencing depth than the standard TruSeq and allows for pooling many more libraries on one lane of the flow cell for multiplex sequencing [31][32][33]. The shortfall of Quantseq is that it does not enable calling alternative splicing variations and SNPs.
In this study, we compared Illumina's TruSeq kit, the standard RNA-Seq (refered to as TruSeq) and Lexogen's QuantSeq kit, the 3' mRNA-Seq method (QuantSeq) for an Atlantic salmon flesh color investigation. Firstly, QuantSeq was used to compare differentially expressed genes (DEGs) between four different flesh color phenotypes. To confirm the results and to analyze whether there are any differences between TruSeq and QuantSeq, TruSeq was used on two chosen sample subsets from two flesh color phenotypes. The sequencing results from TruSeq and QuantSeq libraries were compared to understand their relative benefits and drawbacks.

Library preparation and RNA-sequencing
Post reads quality control, an average of 1.9 million and 21.8 million clean reads were obtained per sample for the QuantSeq and TruSeq libraries, respectively. There was not a marked difference in read mapping rate between the sequences from the two sources, with 85 and 88 % of the reads from QuantSeq and TruSeq, respectively, aligning to the salmon genome (data not shown).

Gene body coverage
The distribution of mapped reads along the transcripts is presented in Fig. 1A. As expected, QuantSeq-Lexogen reads map mainly to the 3' end, while TruSeq-Illumina reads covered the transcripts entirely, with a slight decrease at the 5' and 3' ends.
To illustrate the gene body coverage at the single gene level, we used SH2 domain-containing protein 4A-like (sh24a), where the gene body coverage showed a clear difference in mapping positions between two libraries (Fig. 1B). The QuantSeq reads covered only the last exon at 3' end, while TruSeq reads covered all exons of the gene with only a slight decline in the exon at the 5' end. Therefore, the QuantSeq method could not detect gene isoforms correctly in multi-exonic genes with long transcripts and alternative splicing.

Distribution of reads on different transcript length
Overall, the proportion of mapped genes was similar between QuantSeq and TruSeq libraries (Fig. 2 A and C). Most reads mapped to mRNA smaller than 8000 bp. Figure 2 A and C show that for each gene, TruSeq showed 100 fold higher number of mapped reads (following normalization), consistent with this technology offering a much higher depth of sequencing. In both libraries, most reads mapped to mature mRNA around the size range of 2,000 bp length with Quantseq showing less than 17.5 reads/gene (Fig. 2B) compared with TruSeq with about 1500 normalized mapped read count/gene (Fig. 2D). For the shorter mature mRNAs (< 2,000 bp) a  higher read count was detected with the Quantseq libraries (up to 1,000 read count) (Fig. 2 A) and up to 100,000 read count for the TruSeq libraries. Figure 2E shows the percent of genes per size class that were detected with each library. It illustrates that the depth of sequencing affects the percent of genes identified and as can be expected, there were more expressed genes detected when the sequencing depth is much higher.  (Fig. 3). This result shows that DEGs relating to flesh color phenotype does not manifest in low depth of sequencing despite having many replicates per treatment (from 8 to 13 replicates).

Differentially expressed genes
To examine the question if the lack of differential expression between the two phenotypes is a result of the shallow sequencing depth, the sequencing was repeated using Illumina TruSeq Library Preparation which provides higher sequencing depth.
Using the same ten samples in two group HN and LB, a comparison of the effectiveness of QuantSeq and Tru-Seq libraries was conducted. When the PCA plots of sequencing results for the two subsets of samples were compared, there were two significant clusters based on flesh color phenotypes in both QuantSeq and TruSeq ( Fig. 4 A & B). In the TruSeq data, 191 DEGs were identified while only 50 DEGs were identified from the QuantSeq data when the two subsample groups were    Table 1 compared. Out of these, only 11 DEGs overlapped between the two methods and all had high normalized counts (Fig. 5). These genes relate to oxidative stress and/or immune response (Table 1). Interestingly, the high variation in expression of these DEGs showed a similar pattern in the TruSeq and QuantSeq datasets. The fact that these 11 DEGs showed the same pattern with the two library methods used instil more confidence in the validity of these DEGs. Forty-eight (96 %) and 165 (86.4 %) DEGs (QuantSeq and TruSeq, respectively), had GO annotation (Supplementary 1). Due to higher number of DEGs, there were more GO terms annotated in TruSeq library than QuantSeq.
In the QuantSeq library, we detected 50 DEGs (Supplementary 2a) related to metal ion transport, oxidationreduction process, immune responses and lipid metabolism. Regarding iron ion transport, ferritin middle subunit (fertn-m) was upregulated in HN fish. Also, cytochrome (CY) c oxidative subunit 6 A mitochondriallike (cypc6a) in LB fish and cathepsin L1-like (catl1) and cathepsin B (catb) were upregulated in HN fish. With regards to lipid metabolism, several genes related to lipid binding and transporters are upregulated in HN fish including apolipoprotein A-I (apoa1), apolipoprotein C-I (apoc1), serum albumin 2 (alb2) and fatty acid-binding protein 1, liver (fabp1) (Fig. 6 A).
DEGs detected in the TruSeq library (Supplementary 2b) that were different to those found in the Quantseq library, were also linked with metal ion transport, oxidation-reduction process and immune responses. Regarding the oxidation-reduction process, cypc6a was not a DEG, but three other genes belonging to the CYP450 family were upregulated in HN fish. These include CYP450 2K1-like (cyp2k1), CYP450 3A27-like (cyp3a27) and CYP450 1B1-like (Cyp1b1) (Fig. 6B). In our present study, gastrin-releasing peptide (grp) was upregulated in LB fish, which did not consume food well, while Grp expression in HN fish was lower despite the fact they grew and displayed food uptake.

RT-qPCR validation
Further validation using qPCR was performed on 15 DEGs; five that were DEGs only in the QuantSeq data, 5 that were DEGs only in the TruSeq data and 5 that were DEGs in both datasets. The expression pattern of fifteen genes was validated using RT-qPCR, in comparison with their expression pattern calculated using the QuantSeq and TruSeq libraries (Fig. 7). While the trend of expression was very similar between DEGs measured in both libraries and the qPCR validation, there were only 6 out of 15 of genes that were significantly differentially expressed when assessed using qPCR. Out of five DEGs that belonged to the group significantly expressed in both libraries, there is only one gene, ldtn, which is significantly differentially expressed according to the RT-qPCR result (P < 0.05). From the five DEGs detected in the QuantSeq data, two genes, fabp1 and lyama, were validated as DEGs using RT-qPCR, and three out of five TruSeq data DEGs were validated as DEGs by RT-qPCR (P < 0.05).

SNP analysis
Using Illumina data with high sequencing depth, we found several SNP variants specific to samples of high color-no banding (HN) or low color-banded (LB) traits. In this analysis, we focused on variants located on DEGs or carotenoid-related genes and may cause a missense mutation, assuming that replacement of amino acids could affect protein structure or assembly rate. Among 191 DEGs identified in the Truseq library, there were 5 DEGs with nine missense mutations and 2 DEGs with three missense mutations found in HN and LB fish,   , involved in iron ion transport; cytochrome c oxidative subunit 6 A mitochondrial-like (cypc6a), involved in oxidation-reduction process; cathepsin L1 (catl1) and cathepsin B (catb) involved in apoptosis and muscle degradation; apolipoprotein A1 (apoa1), apolipoprotein C1 (apoc1), serum albumin 2 (alb2), fatty acid-binding protein 1 (fabp1) involved in lipid metabolism; (B) Expression pattern of seven genes identified in the TruSeq library: cyp450 gene family cyp2k1, cyp3a27 and cyp1b1, involved in oxidation-reduction process; gastrin-releasing peptide (grp) involved in regulation of feeding, and tmprss9, dual oxidase 2-like (duox2), dual oxidase maturation factor 1-like (duoxa1) involved in SNP analysis. Asterisks (* and **) indicate significant difference between the HN and LB phenotypes at P < 0.05 and P < 0.01, respectively respectively. In HN fish, nine missense mutations were detected in fucolectin-3-like, serine-aspartate repeatcontaining protein F-like, transmembrane protease serine 9-like, dual oxidase 2-like, and dual oxidase maturation factor 1-like. Three missense mutations were identified in LB fish in fc receptor-like protein 5 and extended synaptotagmin-like protein 2 in LB fish. Most carotenoid-related genes did not have any variants that can be considered reliable SNPs because the read depth on these positions was low (less than ten reads), or the alternative allele count was much lower than the reference allele count (20 % or less). Reliable SNPs were considered if they were found in at least three out of five individuals of only one of the two assessed phenotypes. In 13 carotenoid-related genes in the HN fish, missense mutations were identified. We found Pro21Thr missense mutation, located on ATPbinding cassette, sub-family G (white), member 8 (abcg8) that occurred in three of five heterozygous individuals of high color-no banding group.
When examining SNPs in DEGs, we identified reliable SNPs located in genes associated with immune response and oxidative stress. In the LB fish group, a missense mutation on transmembrane protease serine 9-like (tmprss9) (Fig. 6B) that causes the substitution of Histidine with Arginine in amino acid 22 (His22Arg) was present in all five analyzed individuals and absent in all five of the other phenotype. Moreover, among the isoforms of dual oxidase 2-like (duox2) and dual oxidase maturation factor 1-like, (duoxa1) (Fig. 8), we found up to 3 SNPs for each gene. Three missense mutations, including Ser267Gly, Ala740Asp, and Arg974Gln on duox2 gene and three other missense mutations involving Leu55Phe, Ala213Thr, and Glu239Gly on duoxa1 were present in the same four of the five low colorbanded fish and absent in the other phenotype (Fig. 8).

Discussion
In Atlantic salmon, flesh color is a complex qualitative trait, where its formation is influenced by several factors, including dietary formulations and absorption of carotenoid pigments, water and lipid content of flesh, as well as reproductive stage [34,35]. Flesh discoloration in Atlantic salmon exposed to prolonged hot summers in Tasmania was recorded in recent years [36,37] and is associated with reduced feed intake observed by the presence of feaces in the hindgut [38]. The number of starving fish peaks in March, and then declined over June to disappear when harvested in August [38]. This finding indicates that individuals drop in appeptite followed by starvation owing to the stress response during prolonged elevated temperature in summer. Grunenwald, Adams [20] pointed out that increased dietary astaxanthin, canthaxanthin and vitamin A did not prevent flesh discoloration. In optimal thermal conditions, temperature positively influences plasma Ax level and possibly controls the efficiency of Ax metabolism [39]. More recently, Wade, Clark [21] investigated the effects of a summer heatwave on farmed Atlantic salmon from two distinct groups cultured in sea cages. During that summer where the temperature was for 117 days above 18 o C, flesh color was reduced initially in the front dorsal and then in the back central region. Furthermore, in autumn, a feeding and coloration recovery did not occur evenly among individuals. In the present study, we sampled fish from a single cage in June 2017. This time point was a few months past the heat stress of summer. From historical observations, while some fish at that site had recovered feeding and flesh color, others could not grow well and improve flesh colour at that time point. Therefore, we expected to find fish with different flesh color phenotypes. We hypothesized that the prolonged elevated temperature would not only affect flesh color, but also induce other changes in response to thermal stress, such as starvation and lipid metabolism. In which, these changes might no longer be recovered in some individuals. The flesh coloration variation in response to thermal stress, lipid metabolism and starvation were caused by molecular mechanisms that we studied at the gene expression level. Using both TruSeq and QuantSeq on the same subsets of HN and LB fish (n = 5 per group), a number of DEGs were detected in both phenotypes.
In the context of iron ion transport, fertn-m was upregulated in HN fish when compared to LB fish in our study. Iron is an essential trace element playing a key role as a biocatalyst or electron carrier for many biological reactions [40]. Because pathogens also require iron for proliferation and production of virulence factors, iron metabolism is closely related to the host innate immune response to prevent pathogens from iron uptake through an iron-withholding strategy [41,42]. This strategy is controlled by the upregulation of positive acute-phase proteins, plasma proteins during acute phase response [40]. Ferritin is among these plasma proteins which segregate excess iron into a non-toxic and organic form for iron storage [43]. fertn-m function in Atlantic salmon was well described in previous studies [44]. fertn-m funtions in both mandatory iron storage stages of iron-oxidation and iron-mineralization [40,45]. Several studies indicate that fertn-m is activated to respond to infections and oxidative stress in fish, including Atlantic salmon [40,[46][47][48][49][50][51]. In our previous studies, we showed that aerobic bacteria (Pseudoalteromonadaceae, Vibrionaceae and Enterobacteriaceae families), were enriched in the hindgut of low color fish. These bacterial families display enriched catabolic pathways which might impact on the host physiology leading to reduced flesh coloration [18,52], suggesting competition over iron uptake between the host and aerobic bacteria. Also, catb and catl were upregulated in HN fish in our present study. catb and catl are cysteine proteases involved in the self-defense mechanism against fish pathogens and the host immune defense in vertebrates [53,54]. catb plays essential functions in pathological process and apoptosis [55]. catb may be associated with the regulation of follicular apoptosis in zebrafish [56] and affected the expression of CY genes in Japanese spiky sea cucumber [57]. They also activate the complex mechanism of fish proteolysis and muscle degradation to respond to diverse environmental and biological parameters [58]. LB fish downregulated catl and catb as compared to HN. This might indicate that LB fish could not activate the self-defense mechanism and respond to environmental conditions as efficiently as HN healthy fish.
Another group of genes (alb2, fabp1, apoc1 and apoa1) related to lipid metabolism were also upregulated in HN fish. alb2 is one of the most abundant proteins in the plasma, considered as the long-chain fatty acid transporter when it binds together to form complexes that circulate in the plasma [59,60]. A radioactive labellingbased study proposed that Ax is associated with the serum lipoproteins, the primary transporters of esterified fatty acids and serum albumin [61]. We have found that alb is upregulated in the gut of HN fish and propose that this upregulation may increase the Ax absorption and transport from the gut into the blood system and then deposition in the muscle or liver. Due to their hydrophobic property, carotenoids or Ax are described to be closely associated with fatty acids and transported (See figure on previous page.) Fig. 7 Comparison of variations within biological replicates of 15 DEGs from the Quantseq and Trueseq libraries and RT-qPCR validation using the same biological samples; Gene expression in (A) QuantSeq library; (B) TruSeq library; (C) RT-qPCR. From left to right, the first five genes were significantly different between HN and LB in both library methods but only ldtn shows a significant difference in RT-qPCR assay; the next five genes showed significant differences only in the QuantSeq library analysis, and only two of those (fabp1 and lyama) were significantly different in RT-qPCR; the last five genes showed significant differences only in the TruSeq library, and only three out of these five genes (ovcm2, noxo1 and fclt3) were significantly different using the RT-qPCR assay. Asterisks (* and **) indicate significant difference between the HN and LB phenotypes at P < 0.05 and P < 0.01, respectively with them through the intestine and blood circulation. Moreover, apoc1 was found to inhibit cholesteryl ester transferase protein activity that leads to increased accumulation of cholesterol in high-density lipoprotein (HDL) and a decrease in low-density lipoprotein (LDL) cholesterol level [62,63]. Finally, upregulation of apoa1 has been linked to increased accumulation of fatty acids and cholesterols into HDL [60,63,64]. HDL is known to have a major role in carrying fatty acids and carotenoids in Atlantic salmon [65,66]. Thus, upregulation of alb2, apoa1 and apoc1 in HN fish implies that the uptake, absorption and metabolism of lipid or carotenoids occurs more intensively in HN fish while it might be partially impaired in LB fish.
The CYP450 enzyme family, together with important functions in the oxidation-reduction process, has also been known to induce stress response as a result of a variation in temperature [67]. In our study, cyp2k1, cyp3a27 and cyp1b1 were significantly upregulated in HN fish. Studies in birds indicated that CYP450 enzymes influence bird coloration in species such as zebra finches, red siskins and common canaries [68,69]. CYP450 genes were also found to be upregulated in red flesh phenotype Chinook salmon when compared to white flesh phenotype [70]. In the latter study, the CYP450 2J19 gene (cyp2j19), which is considered as a carotenoid ketolase and mediates red coloration in birds, was also upregulated in the pyloric caeca of red flesh phenotype Chinook salmon [70], although in the present study, cyp2j19 was not a DEG. This finding suggests that cyp2j19 might not be associated with the red coloration in the Atlantic salmon. However, the three CYP450 genes (cyp2k1, cyp3a27 and cyp1b1) might have a link with flesh pigmentation in our study. Given the rapid evolutionary rate of CYP450s and their multiple roles across many metabolic processes [71,72], it is plausible to assume that the three CYP450s identified as DEGs in our study neo-functionalized to have a role in carotenoid metabolism in Atlantic salmon and more comparative research across salmonids and other species with and without pigmentation should resolve this.
In our comparison, the neuropeptide gastrin-releasing peptide (grp) was upregulated in LB fish when compared with HN fish. Aside from functions in the gastrointestinal tract, some neuroendocrine peptides also impact on short-term food satiation and signal the brain to inhibit feeding [73]. In vertebrates, the process of food uptake and digestion is regulated and balanced by a series of endocrine events. In the gut-brain axis, neuroendocrine peptides are secreted by the stimulation of food intake available in the gastrointestinal tract [73,74]. These peptides may act both through endocrine and neural systems. grp was previously shown to stimulate gastric acid release in the fish gut [73] while inhibiting gastric emptying and mediating satiety in fish [74]. In our current results, grp was upregulated in LB fish, which did not consume food well, while grp expression in HN fish was lower despite normal growth and feed intake. Some studies suggest that in mammals, the gastrinreleasing peptide incorporates a wide-ranging of physiological processes including stress responses, immune function, memory consolidation, and emotional behaviors [75,76]. In rat, gastrin-releasing peptide levels was elevated and released in response to exposure to chronic stressor or a shock [77]. We proposee that grp may function as an anorexigenic signal in the gut-brain axis during prolonged thermal stress. The upregulation of grp in LB fish shows that this phenomenon might only occur when fish experienced chronic stress and starvation. This suggests that in LB fish, the control and balance of food intake with the feeding requirement was still dysfunctional likely due to persisting effects of the high thermal stress conditions during summer. Although the stress had been ceased, feeding activities in LB fish had still to recover completely. In addition, grp was considered as an islet neuropeptide that acts as a stimulator of insulin secretion induced by triggering of nervous system in mice [78]. This finding might apply to other vertebrates and be consistent with our result in the upregulation of grp in LB fish. We propose that grp could stimulate the secretion of insulin to balance energy metabolism in LB fish due to prolonged starvation. Accordingly, prolonged starvation is also the main reason for flesh discoloration, as fish likely lost a large amount of uptaken carotenoids to compensate for the amount used by physiological activities in the fish body.
When detecting whether there is a missense mutation in carotenoid-related genes, we found Pro21Thr missense mutation on abcg8, which highly expressed in HN fish. In humans and other mammals, abcg8 and abcg5 (ATP-binding cassette, sub-family G (white), members 8 and 5; also known as sterolins) are known to participate in the physiological pathways involving dietary cholesterol and non-cholesterol sterols [79]. These sterolins are necessary to secreting cholesterol efficiently into the bile and increase cholesterol levels in plasma and liver when the dietary cholesterol content changes [80,81]. It has been shown that 2 % increase in dietary cholesterol improved Ax absorption and deposition as well as Ax levels in the plasma of Atlantic salmon [82]. Hence, in light of the aforementioned findings, additional cholesterol in Atlantic salmon diets could potentially result in increased Ax levels in the plasma, which would then be absorbed by the muscle, leading to a more intense flesh coloration. In addition, abcg8 might have a critical function in increased carotenoid-binding by cholesterols in HN fish. With regard to to ATP-binding cassette, sub family, Zoric [83] reported that a missense mutation in abcg2-1a, resulting in the substitution of Asparagine with Serine in amino acid position 230 (Asn230Ser), is predicted to be associated with flesh color in Atlantic salmon because it is more prevalent in the pale than in the dark flesh phenotype. The complex abcg5/g8 was previously associated with carotenoid metabolism and was predicted to function in limiting dietary carotenoid [84]. Therefore, in our study, it is possible that Pro21Thr missense mutation on abcg8 plays an important role together with abcg5 in the control of secreting carotenoidbinding cholesterols and absorption of dietary carotenoids and mobilization of carotenoids in Atlantic salmon.
In term of SNPs on DEGs identified in TruSeq, the six mutations on duox2-l and duoxa1-l that we detected in LB fish in the current study might potentially lead to the mismatched complex, which then affects their function of generating reactive oxygen species (ROS). Dual oxidase 2 (Duox2) is a member of the NADPH oxidase (NOX) family that generates reactive oxygen species (ROS). This enzyme catalyses the electron transfer from NAPDH-FAD to O 2 , generating superoxide (O 2 . − ) or hydrogen peroxide (H 2 O 2 ) by reducing molecular oxygen [85]. Duox1 and duox2 function solely in regulating specific maturation factors known as duox activators, Duoxa1 and Duoxa2, respectively [86,87]. In a recent study, Duox and Duoxa proteins were found to form stable heterodimers and co-translocate to the plasma membrane (Fig. 8) [88]. Opposed to Nox1-5, Duox enzymes have an extended N-terminal extracellular domain known as peroxidase homology (PoxH) domain, followed by an additional transmembrane segment and an intracellular loop containing two calcium-binding EFhand motifs (Fig. 8). There is a switch in ROS generation from H 2 O 2 to O 2, which happens when Duox2 is mismatched with Duoxa1. Following that, the mismatched combination of Duox2 + Duoxa1 releases O 2 . − together with H 2 O 2 , leading to the phenomenon of O 2 . − leakage [89,90]. In juvenile rainbow trout, dietary Ax supposedly enhanced the antioxidant defense system, which plays a role in the inactivation of ROS [91]. Consequently, we hypothesize that in LB fish, mutations which enhance ROS, together with a deficiency of Ax or carotenoids can be associated with an impaired antioxidant capability.
The qPCR validation confirms that the TruSeq data performance on genes with low expression was much more precise than the QuantSeq data. In addition, although the first ten genes in Fig. 7 (5 DEGs in both libraries and 5 DEGs in QuantSeq only, respectively) had high expression and the last five genes (5 DEGs in TruSeq only) had low expression in both QuantSeq and TruSeq data, the level of expression of the gene did not affect the RT-qPCR result. In RT-qPCR validation, it is evident that genes with high variation within biological replicates can not be considered as differentially expressed (P > 0.05) even though they were picked as DEGs and expressed highly in the QuantSeq and TruSeq data.
A comparison between TruSeq and QuantSeq on two subsets of distinct HN and LB flesh color phenotypes with five replicates revealed many DEGs in both the low and high sequencing depths. As a result, the sequencing analysis were compared to show the benefits and drawbacks of the two methods. In only QuantSeq with a low depth of sequencing, in the comparisons of 39 samples from 4 color phenotypes, there was a high variation in the number of normalized read counts within each group. A recent study of genetic and phenotypic correlations indicated that improvement in growth did not end in any flesh color variation in Atlantic salmon [35]. It suggested that flesh color might be affected by multiple genes and the response was not uniform. Accordingly, more replicates resulted in increased variation of gene expression between replicate samples and as a consequence, DEGs were hardly shown in our present study with increased sample size.
In the standard whole transcript method, mRNAs was fragmented before converted to cDNA. Therefore, the longer the transcript, the more fragments were created. On the other hand, 3' mRNA method creates only one read for each transcript, so the read count reflects the level of gene expression. Bioinformatic analysis is also simplified since exon junction detection, and the normalization of reads to gene length are not required [31,33]. As expected, TruSeq reads mapped transcripts evenly with a minor decline at the 5' and 3' end. This result is compatible with the finding from a previous study [30]. In contrast, QuantSeq reads covered primarily to the 3' end. In vertebrates, genes can be alternatively spliced to create many distinct and expressed isoforms [33]. We hypothesize that TruSeq method is more suitable to determine differences in gene isoforms than QuantSEq. In addition, the proportion of read numbers mapped using TruSeq should increase with transcript size, while transcript size should not change the proportion of reads mapped to each transcript with Quant-SEq. Therefore, only the TruSeq quantification values should be normalized to transcript length. It was therefore expected that the proportion between TruSeq and QuantSeq reads mapped per transcript should increase in favor of TruSeq reads numbers as the size of transcripts increases [31][32][33].
DEG analysis is one of main approaches of RNA-SEq. Following QuantSeq's instruction, we loaded 96 QuantSeq libraries onto one lane to reduce sequencing cost and create about 2 million reads for each library. As per QuantSeq's recommendation, it requires at least 10 million reads per library for QuantSeq transcriptomics in mammals [32]. Because the Atlantic salmon genome is smaller than those of mammals, we expected 2 million reads for each QuantSeq library would satisfy the recommendation. With TruSeq method, over 20 million reads for each library were generated to achieve the minimum requirement of DGE analysis. Therefore, Tru-Seq costs about 6 times more than QuantSeq per sample, however the cost/million reads is the same for either TruSeq or QuantSEq. Some studies have researched the effect of sequencing depth on RNA-Seq, in which a higher power was reached as sequencing depth rises (below 20-30 million reads) [92,93]. Using DESq2 for differential expression analysis, we found that TruSeq detected more DEGs with low expression than Quant-SEq. A previous study compared the efficiency of KAPPA whole transcript and Lexogen 3'RNA library preparation methods using the same depth of sequencing. This study also found that the whole transcript RNA-Seq method detected more DEGs, while more short transcripts was found using QuantSeq [30]. In human, a comparison of transcritompic analysis between Illumina TruSeq and QuantSeq in higher sequencing depth (30 million reads per TruSeq and 10 million reads per QuantSeq sample) was undertaken. Truseq again detected more DEGs then QuantSeq [32]. Moreover, when the sequencing depth of QuantSeq was elevated, the relationship between the TruSeq and QuantSeq results was strongly correlated at the gene expression levels [32]. This suggests that the depth of sequencing effects the number of detected DEGs, indicating that future salmon transcriptomics projects might benefit from having more than 2 million reads per library.

Conclusions
In this study, we found that flesh pigmentation in Atlantic salmon is associated with lipid metabolism, specifically apolipoproteins, serum albumin and fatty acidbinding protein genes which are proposed to play important functions in the absorption, transport and deposition of carotenoids. Moreover, we speculate that grp, which is highly expressed in low banded fish, could inhibit their feeding behavior, leading to the dietary carotenoid shortage, resulting in the flesh discoloration in Atlantic salmon. Several SNPs in genes relating to carotenoid-binding cholesterol and oxidative stress were found in both flesh color phenotypes. Future studies are warranted to confirm if these SNPs have a strong relationship to flesh color, in which case they can be used to understand further the mechanisms involved in flesh color in Atlantic salmon and be tested for suitability as markers for selective breeding programs.
The comparison between the two gene expression studies showed that Lexogen QuantSeq is a costeffective method of library preparation that targets known genes with high expression and no splicing variants. Due to its lower cost of library preparation, a higher number of samples could be analyzed, thus significantly reducing the number of false-positive DEGs detected. For longer transcripts with many isoforms or unknown genes, it is recommended to apply the Illumina TrueSeq method where lowly expressed genes and gene isoforms can be discovered.

Experimental fish and sampling
In June 2017 (end of autumn), 50 fish (3.01 ± 0.09 kg, Mean ± SEM) were sampled randomly from a commercial population at a marine farm in Tasmania, Australia. The time point was chosen based on historical observations at that site showning that some fish can display flesh discoloration at different levels of severity as a result of elevetad summer temperatures. Fish were crowded in repeats utilizing a box-net routinely used for weight checks, and randomly sampled out of the crowd. Fish were euthanized through a non-recoverable percussive blow to the head using an automatic stunner, then put into an ice slurry container and immediately transported to the sampling site. For each fish, the hindgut was collected and preserved in RNAlater (Invitrogen, USA). Samples were kept overnight at 4 o C before storing at -80 o C until further analysis.
After sampling the hindgut, each fish had its flesh color visually assessed in standardized lighting conditions. After hand-filleting and trimming, the left fillet was color assessed and a score from 20 to 34 assigned to it by using the Roche SalmoFan™ Lineal Card (Hoffman-La Roche, Basel, Switzerland) (Fig. 9). To categorize localized discoloration, commercially known as "banding", two regions were selected: the front dorsal region (FD; often affected by discoloration) and the back central (BC; which tends to retain color more evenly). The average score of color between the two regions was computed to classify as HIGH fillets when flesh color was higher than 24 on the SalmoFan or as LOW for flesh color equal or lower than 24. The severity of discoloration (banding) was based on the color score difference between the back central and the front dorsal region (severity = BC score -FD score) and it was classified as NONE for a difference from 0 to 1, or as BANDING from 2 to 3. Following the color assessment, 39 fis h were selected for analysis based on two criteria: average flesh color score and severity of banding; and divided into four groups: 13 high color-no banding (HN), 8 high color-banded (HB), 9 low color-no banding (LN) and 9 low color-banded (LB). There was no significant difference in Fulton's condition factor (K) between the four groups (1.59, 1.69, 1.55 and 1.54 for HN, HB, LN and LB, respectively, P > 0.05). All fish work was carried out in compliance with the University of the Sunshine Coast (USC) animal ethics committee (AN/E/16/12). We confirm the study was carried out in compliance with the ARRIVE guidelines.

RNA extraction, library preparation and sequencing
Total RNA was extracted from hindgut using RNeasy mini Kit (Qiagen, Cat no./ID: 74,104), treated with DNase I (RNase-free) (Biolabs, Cat no. M0303S) for gDNA removal, according to the manufacturer's manual. The quantity, quality and integrity of the extracted RNA were assessed using NanoDrop 2000 spectrophotometer (Ther-moFisher Scientific, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., USA). RNA samples used in the library preparation were chosen based on the following criteria: A260/280 ratio within 2.0-2.2, A260/A230 ratios > 1. 8 (Fig. 10). Briefly, the first cDNA strand is synthesized using an oligo (dT) primer containing the Illumina-specific Read 2 linker sequence (P7). A random primer containing the Illumina-specific Read 1 linker sequence (P5) and DNA polymerase were then used to synthesize the second cDNA strand. Paired-end sequencing is not recommended for QuantSeq library owing to the low quality of Read 2. Read 2 would begin with poly(T) stretch and sequence through the homopolymer stretch. Library quantity was measured using Quantus Fluorometer (Promega Corporation). Sequencing was done employing the Illumina HiSeq 2500 (100 bp, single-end) at the Australian Genome Research Facility.
A fish displaying HN was considered as an unaffected individual where flesh color is high and even, while a fish displaying LB as a heavily affected individual with abnormal flesh color. Therefore, given the difference between the two extreme phenotypes, HN and HB were predicted to have distinct genotypes. Therefore, in the comparison of the effectiveness of QuantSeq and TruSeq libraries, the same ten randomly chosen extracted RNA samples which were used for QuantSeq library were selected from two groups: five of extremely HN and five of extremely LB for repeated sequencing using Illumina library preparation. Extracted RNA samples were sent to Novogene (Hong Kong) for library preparation using Illumina TruSeq DNA PCR-Free Library Preparation kit, followed by sequencing on the Illumina HiSeq2500 (150 bp, pair-end reads).

Transcript coverage
Following mapping to the Atlantic salmon genome, all 20 BAM files from both TruSeq and QuantSeq data were loaded onto RSeQC v2.6.4 [97] to calculate transcript coverage. SH2 domain-containing protein 4 A-like (sh24a) gene coverage was visualized using two BAM files from TruSeq and QuantSeq of the same sample using Integrative Genomics Viewer [98].

Distribution of mapped reads on different mature mRNA length
In QuantSeq, the number of mapped reads for each mature mRNA and mature mRNA length were scatterplotted using ggplot2 package in RStudio 1.3.1093. In TruSeq, the number of mapped reads for each mature mRNA (RC) was normalized with mature mRNA length (L) following the formula below before scatter-plotting as done with QuantSeq.
Where NRC i = normalized reads count of mature mRNA i; RC i = read counts of mature mRNA i and L i = length of mature mRNA i (bp).
Mature mRNAs were firstly divided into eleven various groups based on their length at 1000 bp intervals (200-1000 bp, 1001-2000 bp and so on), with the last group being the total mature mRNA longer than 10,000 bp. A total number of mapped mature mRNA was then divided by the total number of mature mRNA in each group to identify the proportion of mRNA to which reads mapped from each library, at each length group.
Differential expression analysis and gene ontology annotation DEG analysis was conducted by DESeq2 in R (version 3.5.3) [99]. A gene was considered as DEG between two conditions when the following applied: |log 2 (fold change)| > 1, padjusted value < 0.05. The number of overlapping DEGs from the two datasets (TruSeq and QuantSeq datasets) was visualized using InveractiVenn [100]. All DEGs were then loaded into OmicsBox v1.4 (OmicsBox -Bioinformatics Made Easy, BioBam Bioinformatics, March 3, 2019, https://www.biobam.com/ omicsbox) for further downstream analysis. Here, DEGs were searched against the NCBI non-redundant database (nr) using BlastX algorithm with E-value threshold set at 1.00E -6 . Blasted transcripts were afterwards annotated then classified into three ontology categories: molecular function, cellular component and biological process. The distribution of gene onthology (GO) terms from two different libraries was subsequently plotted using Web Gene Ontology Annotation Plot v.2.0 (WEGO 2.0) software [101].

Real-time quantitative PCR
Fifteen genes were chosen from three clusters based on strong significance of Log2FoldChange, regardless of their function: five DEGs in both library methods; five DEGs only in QuantSeq library; and five genes only in TruSeq library. 18 S rRNA was used as a housekeeping gene (HKG). Sixteen pairs of primers were designed using the 'Assay Design Center' available at the Roche website (https://qpcr.probefinder.com/organism.jsp) and listed in Table 2. Real-time quantitative polymerase chain reaction (RT-qPCR) was performed as previously described [102] with slight modifications to validate DEGs found in either or both sequencing datasets. cDNA templates were synthesized using Bioline Tetro cDNA Synthesis kit (Cat. No. BIO-65,043) from the same RNA samples used for sequencing. Primers were mixed with the cDNA and FastStart Universal Probe Master (Rox; Roche Diagnostics GmbH) and Universal ProbeLibrary Probe (Roche). The HKG 18 S rRNA served to normalize the quantification. qPCR cycles included 10 min incubation at 94°C, followed by 40 cycles of 94°C for 10 s and 60°C for 30 s, with green fluorescence measurement on each cycle at 60°C. Reactions were performed in Rotor-Gene Q (Qiagen). Relative quantification was calculated by equilibrating to the level of HKG per sample and the sample with the lowest value (2 −ΔΔCT ). Statistical analysis of the resulting RQs was performed using ANOVA, followed by Wilcoxon test, with P < 0.05 considered as statistically significant.

Single nucleotide-polymorphisms (SNPs) analysis
From the Illumina TruSeq data, two sets of variants were detected with FreeBayes [103], a Bayesian genetic variant detector designed to find polymorphisms, specifically SNPs. All SNPs were annotated and the effects of the SNP variants on genes and proteins were determined with Ensembl Variant Effect Predictor (VEP) [104]. All