RNA-seq based detection of differentially expressed genes in the skeletal muscle of Duroc pigs with distinct lipid profiles

We have used a RNA-seq approach to investigate differential expression in the skeletal muscle of swine (N = 52) with divergent lipid profiles i.e. HIGH (increased intramuscular fat and muscle saturated and monounsaturated fatty acid contents, higher serum lipid concentrations and fatness) and LOW pigs (leaner and with an increased muscle polyunsaturated fatty acid content). The number of mRNAs and non-coding RNAs (ncRNAs) expressed in the porcine gluteus medius muscle were 18,104 and 1,558, respectively. At the nominal level of significance (P-value ≤ 0.05), we detected 1,430 mRNA and 12 non-coding RNA (ncRNA) transcripts as differentially expressed (DE) in the gluteus medius muscle of HIGH vs LOW pigs. This smaller contribution of ncRNAs to differential expression may have biological and technical reasons. We performed a second analysis, that was more stringent (P-value ≤ 0.01 and fold-change ≥ 1.5), and only 96 and 0 mRNA-and ncRNA-encoding genes happened to be DE, respectively. The subset of DE mRNA genes was enriched in pathways related with lipid (lipogenesis and triacylglycerol degradation) and glucose metabolism. Moreover, HIGH pigs showed a more lipogenic profile than their LOW counterparts.

Differential expression of non-coding RNAs. We identified 1,558 ncRNA transcripts expressed in the pig gluteus medius muscle, with sizes between 53 and 9,032 bp (Supplementary Table S7). Amongst these, 1,354 and 204 transcripts were classified as small (sncRNA) and long (lncRNA) non-coding RNAs, respectively. It is important to emphasize that the annotation of porcine ncRNAs is still very preliminar and it should be taken with caution. In general, sncRNA had orthologous sequences in other mammalian species, while lncRNAs were much less conserved (Table 3). We only detected 12 ncRNAs (11 lncRNAs and 1 sncRNA) that were DE at the nominal level (P-value ≤ 0.05), while none of these ncRNAs remained significant after correction for multiple testing (in all cases the q-value was non-significant, Table 4).
In addition, we identified 25 mRNA-encoding genes that mapped near (30 kb or less) to the subset of DE ncRNA loci (Table 5). This observation may have biological implications because ncRNAs often cis-regulate the expression of genes located in their vicinity. Within this list of neighbouring genes (Table 5)

Discussion
Divergent muscle mRNA expression profiles in pigs with extreme phenotypes for fatness traits. After correcting for multiple testing, twenty-one genes, displaying a wide array of functional roles, showed a significant DE between HIGH and LOW pigs (Table 1). For instance, SLC27A4 is involved in the translocation of long-chain fatty acids across the plasma membrane 9 while SFRP5 plays a role in anti-inflammatory and insulin-sensitizing processes 10 and AGO2 and MVP contribute to RNA interference 11 and signal transduction and transport 12 , respectively. Two of the genes listed in Table 1 might be related with meat quality i.e. RNF181, which encodes a E3 ubiquitin-protein ligase that participates in the degradation of muscle proteins through the ubiquitin-proteasome system 13 , and SDK1, which has been associated with intramuscular fat (IMF) content in Large White pigs 14 .
The Spearman correlation between the microarray data reported by Canovas et al. 7 in 68 HIGH and LOW pigs and RNA-seq data generated in the current study (N = 52) was 0.54. This value is comparable to what has been published in previous studies analysing gene expression in human brain cells (r = 0.61-0.67) 15 and proliferating vs quiescent fibroblasts (r = 0.18-0.42) 16 . We also compared our dataset of DE genes with those detected by Canovas et al. 7 . As shown in Suppl Figure 1 the level of concordance was quite low (only 76 genes were simultaneously identified by both platforms). A modest overlap between microarray and RNA-seq data has been reported in previous studies. For instance, Trost et al. 16 analysed the concordance between both types of data in fibroblasts cultured at two different developmental stages, and they just found an overlap of around 25% in the two lists of DE genes. This value is higher than the one reported by us, but it is important to highlight that the analysis of Trost et al. 16 was based on a set of probes common to both platforms. Moreover, the microarray analysis performed by Canovas et al. 7 was based on a dataset of around 68 pigs, while we used a subset of 52 individuals in our RNA-seq analysis. Trost et al. 16 used quantitative real-time PCR as a third approach to validate microarray and RNA-seq data and they found that RNA-seq outperforms the microarray technology. However, differences between both methods are not dramatic i.e. the Spearman correlations between microarray and RNA-seq data vs qPCR validation results were 0.44 and 0.56, respectively. This means that both technologies detect different sets of DE expressed genes and, in consequence, they are complementary 17 . According to Wang et al. 18 , the magnitude of the treatment effect has a strong impact on the level of concordance between microarray and RNA-seq platforms i.e. large discrepancies can be anticipated when two similar biological conditions are compared. Low-abundance transcripts are another source of discrepancy between both methodological approaches 18 .
We found some evidence that pathways related with lipid synthesis (stearate, palmitate and FA synthesis) and catabolism (triacylglycerol degradation), glucose metabolism (glucose synthesis and degradation) and hormonal response (growth hormone signalling) were enriched in the set of DE genes ( Table 2 and Supplementary Table S4). Similar results were obtained by Cánovas et al. 7 i.e. they detected an overexpression of pathways related with the synthesis of FA and insulin signaling in HIGH pigs. Puig-Oliveras et al. 1 compared the muscle mRNA expression of pigs with high saturated (SFA) and monounsaturated (MUFA) FA muscle contents against those with a high polyunsaturated FA (PUFA) content and also observed an enrichment of pathways related with fat deposition  Table 2. IPA-based pathway analysis of the list of genes that are differentially expressed in HIGH and LOW pigs (P-value ≤ 0.01 and fold-change ≥ 1.5). Ratio: number of DE genes in a pathway divided by the number of genes comprised in the same pathway.
(PPAR and insulin signalling) in the set of DE genes. Insulin stimulates the absorption of glucose, which is a lipogenic substrate, and PPARG enhances triglyceride storage 19 . By using the same animal material employed by Puig-Oliveras et al. 1 , Corominas et al. 5 observed an overexpression of genes belonging to the LXR/RXR activation pathway in the adipose tissue of pigs with high muscle SFA and MUFA contents. These results, which agree well with ours (Supplementary Table S4), make sense because liver X receptors are sterol-activated transcription factors that enhance lipogenesis 20 .
Though not all studies comparing pigs with divergent lipid phenotypes identify the same sets of pathways, an outcome that partly depends on the software and databases used as well as on the targeted tissue and phenotype variability, the general trend that emerges is that biochemical routes that promote lipid deposition are overexpressed in the skeletal muscle of fat pigs with high muscle SFA and MUFA contents. In close concordance with a previous study 7 , we have also found that one gene that promotes the catabolism of triglycerides, carboxylesterase 1 (CES1), is strongly upregulated in HIGH pigs (P-value = 0.0006, FC = 2.4). The CES1 protein has hydrolase activity and its inactivation leads to hyperlipidemia and increased fat deposition in peripheral tissues, obesity, fatty liver, hyperinsulinemia and insulin insensitivity and a decreased energy expenditure 21 . According to Cánovas and coworkers 7 , the upregulation of lipolytic genes in HIGH pigs suggests the existence of a cycle where triacylglycerols are continuously synthesized and degraded. However, we have also detected the downregulation of lipolytic genes such as lipase C, hepatic type (LIPC, P-value = 0.002, FC = −1.5) 22 , a feature that suggests that the mechanisms that promote an adequate balance between anabolic and catabolic lipid metabolism routes are highly complex.
Analysis of the data with the IPA software (QIAGEN) showed that the top-scoring regulatory network was Cardiovascular Disease, Cardiovascular System Development and Function, Organismal Injury and Abnormalities, a result that it is not surprising given the tight relationship between lipoprotein metabolism and cardiovascular risk 23 . In the network shown in Fig. 1, the V-Akt murine thymoma viral oncogene homolog molecule (AKT) occupies a central position, having connections with several DE lipid-related genes (e.g., TRIB3, TIMP1 and ITGA5). Interestingly, AKT is one of the main regulators of glucose homeostasis 24 , a feature that is consistent with the existence of tight links between lipid and carbohydrate metabolism.
When we used the Regulator Effects tool of IPA, the PPARG and PDGFB genes were predicted to be major transcriptional regulators of the set of 96 DE loci (Fig. 2). The PPARG transcription factor is critically required for adipogenesis, being a powerful modulator of whole-body lipid homeostasis and insulin sensitivity 25 . Polymorphism in the PPARG gene is associated with individual susceptibility to type 2 diabetes, obesity and body mass index 26 . In our study, PPARG is upregulated (P-value = 0.02 and FC = 1.36) in HIGH pigs and appears to regulate several genes, such as CEBPA (P-value = 0.009 and FC = 1.64), CES1 (P-value = 0.0004 and FC = 2.03), Limited contribution of the non-coding RNA transcriptome to differential expression between HIGH and LOW pigs. Non-coding RNAs have been shown to regulate gene expression by interacting with chromatin complexes, working as RNA enhancers, recruiting or assembling certain proteins and interacting with other RNAs at the post-transcriptional level 27 . In consequence they may play a fundamental role in the metabolism of the porcine skeletal muscle. In our study, we have identified 1,558 muscle-expressed ncRNA transcripts In the upper tier, there are two upstream regulators (PPARG and PDGFB) predicted to be activated (orange color). In the middle tier, we can see the genes whose expression changes in response to the activation of upstream regulators (red = upregulation). The shapes of the nodes reflect the functional class of each gene product: enzyme (vertical rhombus), transcription regulator (vertical ellipse), cytokine/growth factor (square), ligand-dependent nuclear receptor (horizontal rectangle) and complex/group/ other (circle). In the lower tier, the expected phenotypic consequences of changes in gene expression are shown by considering the Ingenuity Knowledge Base (absolute z-score > 2 and P-value < 0.05). The octagonal symbol defines Function, while solid and dashed lines between genes represent known direct and indirect gene interactions, respectively. Orange leads to activation, while blue leads to inhibition predicted relationships. Orange (predicted to be activated) and blue (predicted to be inhibited) lines represent relationships with causal consistency.  (Supplementary Table S7). The total number of ncRNAs in the pig genome is currently unknown, but Zhou et al. 28 highlighted the existence of at least 6,621 long intergenic non-coding RNAs (lincRNA) transcripts encoded by 4,515 gene loci. In humans, 58,648 lncRNA encoding loci have been identified so far 29 . In our dataset (Table 3), the degree of evolutionary conservation of sncRNAs happened to be much higher than that of lncRNAS. Zhou et al. 28 characterized the porcine lincRNA transcriptome and found that only 40% of the transcripts had a detectable human lincRNA ortholog. This scarcity of orthologous sequences can be due, in part, to the poor annotation of ncRNAs in all investigated species.  There is growing evidence that there might be a positive correlation between the expression of ncRNAs and nearby mRNA encoding genes, suggesting that the former may regulate the expression of the latter 30 . We investigated this issue by analysing if there are DE protein-coding genes in the vicinity of any of the 12 DE ncRNAs identified in our work (P-value ≤ 0.05, Tables 4 and 5). Two protein-coding genes, i.e. mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 6 (MT-ND6) and CU468594.8, fulfilled this condition (P-value ≤ 0.05 and FC ≥ 1.2, Table 5). The MT-ND6 gene encodes a NADH dehydrogenase that catalyses the oxidation of NADH by ubiquinone, an essential step in the mitochondrial electron transport chain 31 . The CU468594.8 locus is orthologous to human solute carrier family 52-riboflavin transporter, member 2 (SLC52A2). Riboflavin is the precursor of flavin adenine dinucleotide (FAD) and flavin mononucleotide (FMN), two essential cofactors that participate in a wide range of redox reactions 32,33 .
We aimed to ascertain if differences amongst HIGH and LOW pigs, in terms of IMF content and composition, are mainly due to the DE of either mRNA or ncRNA encoding genes. When considering a nominal P-value of 0.05 as a threshold of significance, the number of DE ncRNAs (12 loci) was much smaller than that of DE mRNAs (1,430 loci), even if we take into account that the number of expressed mRNAs (18,104) was also higher than that of ncRNAs (1,558). Moreover, none of the DE ncRNAs remained significant after correction for multiple testing. In a recent experiment, the transcriptome of pig endometrial samples collected at different pregnancy stages was characterized, and 2,376 transcripts were identified as DE in pairwise comparisons 34 . Only 12% of these transcripts corresponded to lncRNAs indicating that changes in the endometrial transcriptome associated with pregnancy mainly affect the expression of protein-coding genes. However, studies performed in humans indicate a much more balanced contribution of mRNAs and ncRNAs to differential expression. For instance, Wang et al. 35 investigated the expression patterns of peripheral leukocytes of healthy and autistic individuals and identified 3,929 and 2,591 DE lncRNAs and mRNAs, respectively. Similarly, Zhou et al. 36 identified 891 and 576 DE mRNAs and lncRNAS, respectively, when comparing the expression patterns of ectopic and eutopic endometrial tissue. These differences between humans and pigs are probably the consequence of technical rather than biological causes, evidencing the pressing need of improving the genomic and functional annotation of porcine ncRNAs.

Conclusions
By comparing the mRNA expression of HIGH and LOW pigs by RNA-seq, we have identified 96 loci displaying differential expression (P-value ≤ 0.01 and FC ≥ 1.5). Many of these loci were not detected in a previous microarray-based experiment, suggesting that distinct platforms detect different sets of DE genes. Lipid biosynthetic pathways were enriched in DE genes and upregulated in HIGH pigs, a result that is consistent with previous reports. We have also undertaken the analysis of non-coding RNAs, a feature that has been neglected in previous studies investigating the differential expression of porcine genes. Our results indicate that the number of DE non-coding RNAs is much lower than that of mRNAs, an outcome that might be partly explained by the poor annotation of porcine ncRNAs.

Material and Methods
Ethics statement. All experiments were performed in accordance with the ARRIVE guidelines (https:// www.nc3rs.org.uk/arrive-guidelines). Animal care and management procedures were approved by the Ethical Committee of the Institut de Recerca i Tecnologia Agroalimentàries, IRTA.
Animal Material. One population of 350 Duroc barrows belonging to 5 half-sib families, and distributed in 4 fattening batches was generated in 2003. All animals were kept under the same feeding and management conditions 37 . A wide array of growth, fatness, feed efficiency and carcass and meat quality traits were recorded in these animals, including weight, daily food intake, fat deposition, and IMF content and composition (C:12-C:22 interval) of the gluteus medius muscle 7 . By using a principal component analysis based on 13 lipid-related traits, we selected two groups of pigs, i.e. HIGH and LOW, displaying distinct phenotypic profiles 7 (Supplementary Table S8). Compared with their LOW counterparts, HIGH pigs were fatter and they had a higher IMF, SFA and MUFA muscle contents as well as elevated serum lipid concentrations 7 . LOW pigs, in contrast, had a higher muscle PUFA content 7 .
RNA isolation and library construction and sequencing. Total RNA was isolated from 56 porcine gluteus medius muscle samples (28 HIGH and 28 LOW) by using the acid phenol method implemented in the RiboPure kit (Ambion, Austin, TX). Total RNA was quantified in a Nanodrop ND-1000 spectrophotometer, checked for purity and integrity in a Bioanalyzer-2100 device (Agilent Technologies, Inc., Santa Clara, CA) and submitted to the Centre Nacional d' Anàlisi Genòmica (CNAG, http://www.cnag.cat) for sequencing. Libraries were prepared using the TruSeq RNA Sample Preparation Kit (Illumina Inc) according to the protocols recommended by the manufacturer. Each library was paired-end sequenced (2 × 75 bp) by using the TruSeq SBS Kit v3-HS, in a HiSeq2000 platform.
Bioinformatic analyses. All bioinformatic analyses were performed with the CLC Bio Workbench software (CLC Bio, Aarhus, Denmark). Quality control was carried out with the NGS Core Tools, considering several parameters based on the FastQC-project (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). We carried out per-sequence and per-base analyses to filter reads according to the following criteria: sequence-read distribution = 75 bp, 100% coverage in all bases, GC-content ~50%, ~25% of A, T, G and C nucleotide contributions, ambiguous base-content < 0.1% and a Phred score higher than 30 (i.e. base-calling accuracy larger than 99.9%). Short sequence reads were assembled, mapped and annotated by using as template the pig reference genome version 10.2 (Sscrofa10.2-http://www.ensembl.org/info/data/ ftp/index.html). For mapping purposes, we just considered alignments with a length fraction of 0.7 and a similarity fraction of 0.8. Besides, two mismatches and three insertions and deletions per read were allowed.
Gene expression data were normalized by calculating the reads per kilobase per million mapped reads (RPKM) 38 . Using scales of abundance estimates by exon length and millions of mapped reads, original expression values were transformed and normalized. More specifically, data were transformed on a decimal logarithmic scale and a scaling algorithm was utilized for the normalization of average scores 39 . For the statistical analysis of differential expression, we used a two-tailed t-test that assumes a Gaussian distribution and homogeneous variances. This statistical test compares the mean expression levels in the two experimental groups (HIGH vs LOW) and evaluates the significance of the difference relative to the variance of the data within the groups. Multiple testing correction was performed by using a false-discovery rate approach (cut-off = 0.05) implemented in the QVALUE R package 40 . Fold-Change was computed as the ratio of HIGH vs LOW gene expressions (a negative FC means that the affected gene is upregulated in LOW pigs).
Ingenuity Pathway Analysis (IPA, QIAGEN Redwood City, www.qiagen.com/ingenuity) was used to identify gene ontologies, pathways, and regulatory networks to which DE genes belong to, as well as upstream regulators. Ingenuity Pathway Analysis can transform a set of genes into a number of relevant networks based on comprehensive records maintained in the Ingenuity Pathways Knowledge Base. Networks are presented as graphs depicting the biological relationships between genes/gene products. Genes are shown as nodes, and the molecular relationship between two nodes is represented with either a solid (direct interactions) or a dashed (indirect interactions) line. The analysis of upstream regulators considers every possible transcription factor and upstream regulator contained in the Ingenuity Knowledge Base repository as well as their predicted effects on gene expression (inferred from the scientific literature). Then, this tool analyses if the patterns of expression observed in the DE genes can be explained by the activation/inhibition of any of these regulators through the calculation of a z-score i.e. a statistical measure of the match between expected relationship direction between the regulator and its targets and observed gene expression 41 . A parallel analysis was performed with the Cytoscape software 42 by using the ReactomeFIViz app 8 . IPA and Cytoscape analyses were performed on a subset of DE genes, with P-value ≤ 0.01 and a FC ≥ 1.5. Transcript classification and the search of homologs of porcine ncRNAs in other mammalian species were carried out with tools implemented in the BioMart web interface (http://www.ensembl. org/biomart/martview).