Integrated Analysis of Proteomic and Transcriptomic Data Highlights Late Fetal Muscle Maturation Process*

In pigs, the perinatal period is the most critical time for survival. Piglet maturation, which occurs at the end of gestation, is an important determinant of early survival. Skeletal muscle plays a key role in adaptation to extra-uterine life, e.g. motor function and thermoregulation. Progeny from two breeds with extreme neonatal mortality rates were analyzed at 90 and 110 days of gestation (dg). The Large White breed is a highly selected breed for lean growth and exhibits a high rate of neonatal mortality, whereas the Meishan breed is fatter and more robust and has a low neonatal mortality. Our aim was to identify molecular signatures underlying late fetal longissimus muscle development. First, integrated analysis was used to explore relationships between co-expression network models built from a proteomic data set (bi-dimensional electrophoresis) and biological phenotypes. Second, correlations with a transcriptomic data set (microarrays) were investigated to combine different layers of expression with a focus on transcriptional regulation. Muscle glycogen content and myosin heavy chain polymorphism were good descriptors of muscle maturity and were used for further data integration analysis. Using 89 identified unique proteins, network inference, correlation with biological phenotypes and functional enrichment revealed that mitochondrial oxidative metabolism was a key determinant of neonatal muscle maturity. Some proteins, including ATP5A1 and CKMT2, were important nodes in the network related to muscle metabolism. Transcriptomic data suggest that overexpression of mitochondrial PCK2 was involved in the greater glycogen content of Meishan fetuses at 110 dg. GPD1, an enzyme involved in the mitochondrial oxidation of cytosolic NADH, was overexpressed in Meishan. Thirty-one proteins exhibited a positive correlation between mRNA and protein levels in both extreme fetal genotypes, suggesting transcriptional regulation. Gene ontology enrichment and Ingenuity analyses identified PPARGC1A and ESR1 as possible transcriptional factors positively involved in late fetal muscle maturation.

In pigs, the perinatal period is the most critical time for survival. Piglet maturation, which occurs at the end of gestation, is an important determinant of early survival. Skeletal muscle plays a key role in adaptation to extrauterine life, e.g. motor function and thermoregulation. Progeny from two breeds with extreme neonatal mortality rates were analyzed at 90 and 110 days of gestation (dg). The Large White breed is a highly selected breed for lean growth and exhibits a high rate of neonatal mortality, whereas the Meishan breed is fatter and more robust and has a low neonatal mortality. Our aim was to identify molecular signatures underlying late fetal longissimus muscle development. First, integrated analysis was used to explore relationships between co-expression network models built from a proteomic data set (bi-dimensional electrophoresis) and biological phenotypes. Second, correlations with a transcriptomic data set (microarrays) were investigated to combine different layers of expression with a focus on transcriptional regulation. Muscle glycogen content and myosin heavy chain polymorphism were good descriptors of muscle maturity and were used for further data integration analysis. Using 89 identified unique proteins, network inference, correlation with biological phenotypes and functional enrichment revealed that mitochondrial oxidative metabolism was a key determinant of neonatal muscle maturity. Some proteins, including ATP5A1 and CKMT2, were important nodes in the network related to muscle metabolism. Transcriptomic data suggest that overexpression of mitochondrial PCK2 was involved in the greater glycogen content of Meishan fetuses at 110 dg. GPD1, an enzyme involved in the mitochondrial oxidation of cytosolic One objective of systems biology is to investigate the regulation and interaction of various components of the cell including DNA (genomics) (1), mRNA (transcriptomics) (2), proteins (proteomics) (3) or metabolites (metabolomics) (4). Even though transcriptomic analysis provides deep insights into cellular processes, possible conclusions are limited (5). Indeed, mRNA expression is not always a good predictor of protein level because low correlations between mRNA and protein expression levels are often observed. For example, a relatively small change in mRNA expression can result in a major change in the total abundance of the corresponding protein, enabling potentially different conclusions to be drawn from transcriptomic and proteomic analyses.
Biological integration of transcriptome and proteome remains challenging in omics studies because of the marked differences between the two approaches, e.g. the dynamic range of expression, incomplete annotation or isoform differences (6). The expression of genes may not be correlated with the abundance of proteins and provide no information on post-transcriptional events, e.g. translational efficiency, alternative splicing, folding or assembly into complexes (5,7,8). An integrated multiomics approach, with gene expression experiments and large-scale protein identification experiments, should provide a deeper understanding of the functional interactions between mRNA and protein layers as a complex biological system. Several biological data integration strategies have already been suggested, see review by Ritchie et al. (2015) (9). In this review, the authors suggested three meta-dimensional analyses combining multiple data types in the same analysis: concatenation, transformation and modelbased integration. Here we chose to develop an innovative integration strategy combining state-of-the-art statistical and computational methods using proteomic and phenotypic data, with the incorporation of transcriptomic information, to observe and identify some important proteins with possible transcriptional regulation. One of the advantages of our strategy is to avoid the scale issues that may arise when different types of data are combined.
Networks are increasingly used as tools for analysis and for the visualization of data in biology and genetics (10 -13). A complex network architecture into clusters of functionally related genes/proteins can be explored and genes/proteins with high connectivity (called hubs) can be identified. In our study, integrated network analysis was first performed to explore relationships between co-expression network models, built from a proteomic data set, and phenotypes of interest that would enable identification of important molecular signatures underlying late fetal muscle development. Correlations with a transcriptomic data set were then investigated to complete and combine different layers of expression.
Here we report the results of multiomics analyses of muscle during the last 3 weeks of gestation in pigs. The objective was to identify proteins, with possible transcriptional regulation, and related biological mechanisms, those involved in differences in the muscle maturation process during late gestation between two extreme pig breeds: Large White (LW) 1 , Meishan (MS), and reciprocal crosses. The LW breed is a highly selected breed for lean growth with a high rate of mortality at birth, whereas MS is a fatter and more robust breed that produces piglets with an extremely low mortality rate (14,15). Physiological muscle maturity, which occurs at the end of gestation, has already been shown to improve early survival after birth (16 -18). Successful maturation in late gestation thus likely leads to a state of full development and promotes better early survival after birth. Postnatal mortality because of immaturity is not only an issue in pigs but affects other mammals including sheep (19) and humans (20,21). Adaptation to extra-uterine life is therefore a major factor in the survival of all mammal species. In the present study, using network data integration analysis, we identified key proteins involved in piglet maturity during late gestation and provided an overview of the muscle maturation process in late gestation of which many aspects can be generalized to other mammals.

EXPERIMENTAL PROCEDURES
Ethics Statement-Use of animals and procedures performed in this study was approved by the European Union legislation (directive 86/609/EEC) and French legislation in the Midi-Pyré né es Region of France (Decree 2001-464 29/05/01; accreditation for animal housing C- 35-275-32). The technical and scientific staff obtained individual accreditation (MP/01/01/01/11) from the ethics committee (ré gion Midi-Pyré né es, France) to experiment on living animals. All the fetuses used in this study were males and were obtained by caesarean.
Animal Study Design-Details regarding animal resources and experimental designs can be found in (18). Briefly, longissimus skeletal muscle mRNA, protein and phenotypic data were acquired at two developmental time points (90 and 110 days of gestation (dg)) from four fetal genotypes. These genotypes consisted in two extreme breeds for mortality at birth (Meishan (MS) and Large White (LW)) and two crosses (MSLW from LW sows and LWMS from MS sows). Pure MS and LW sows were inseminated with mixed semen so that each litter was composed of purebred and crossbred fetuses. The two developmental time points correspond to the end of gestation when intense maturation of muscle fibers occurs from 90 dg to birth (around 114 dg) (16,17,22). Therefore, eight conditions were considered according to the two developmental time points and the four fetal genotypes (n ϭ 64 fetuses exhibiting birth weight close to the average body weight within litter and genotype, n dg90.MS ϭ 8, n dg90.LW ϭ 8, n dg90.MSLW ϭ 8, n dg90.LWMS ϭ 8, n dg110.MS ϭ 8, n dg110.LW ϭ 10, n dg110.MSLW ϭ 6, and n dg110.LWMS ϭ 8).
Muscle Sampling and Biochemical Analysis-The longissimus muscle (LM) was collected at the last rib level within 30 min after death, cut into small pieces, snap frozen in liquid nitrogen, and stored at Ϫ75°C until further analyses. Glycogen content was determined in LM according to the method described by Good et al. (23) with minor adaptations (24), and is expressed as g/100 g tissue wet weight. Myosin heavy chain (MyHC) polymorphism was characterized both at the mRNA level by quantitative real-time PCR amplification using the TaqMan technology and at the protein level using one dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (1D SDS-PAGE) as previously described (25). At the mRNA level, the expression of each MyHC (MYH7 ϭ type I MyHC, MYH2 ϭ type IIa MyHC, MYH1 ϭ type IIx MyHC, MYH4 ϭ type IIb MyHC, MYH3 ϭ embryonic MyHC, MYH8 ϭ perinatal MyHC, MYH6 ϭ ␣-cardiac MyHC, and MYH13 ϭ extraocular MyHC) was calculated based on the PCR efficiencies and a calibrator, and expressed in comparison to an invariant endogenous reference gene (hypoxanthine phosphoribosyltransferase 1, HPRT1) as described by Pfaffl (26). HPRT1 expression was not affected by the fixed factors used in the statistical analysis. Types I, IIa, IIx, and IIb MyHC are the four MyHC that define muscle fiber types in adult skeletal muscle (27), whereas embryonic, perinatal, and ␣-cardiac MyHC are expressed transitorily during the fetal and early postnatal periods in pigs (28,29). At the protein level, four bands were separated by 1D SDS-PAGE and corresponded to the embryonic, perinatal, fast-type II (IIa ϩ IIx ϩ IIb) and slow-type (I ϩ ␣-cardiac) MyHC, respectively (30). Each band is expressed as a percentage of all four bands within a lane.
Proteome Analysis-Muscle total protein extraction and bi-dimensional electrophoresis were performed on the 64 muscle samples as previously described (31). Briefly, total proteins were extracted from 50 mg of powdered frozen muscle tissue homogenized into 1 ml lysis buffer containing 7.5 M urea, 2 M thiourea, 2% CHAPS, 40 mM Tris base, 50 mM DTT, 1 mM EDTA, 1% IPG buffer pH 3-11 nonlinear (NL), 1 g/ml pepstatin, and 1.5 mg/ml complete protease inhibitor mixture (Roche, Mannheim, Germany), using a handheld Teflon-pestle-glass Potter-Elvehjem homogenizer. The lysate was incubated at room temperature for 1 h with rotational shaking, followed by centrifugation at 60,000 ϫ g for 1 h at 18°C. Supernatant was collected avoiding the fat layer, aliquoted and stored at Ϫ75°C until further analysis. Protein concentration in the supernatants was determined using the RC-DC Protein Assay (Bio-Rad, Hercules, CA) with bovine serum albumin as a standard. For the first dimension, 300 g protein were loaded onto immobilized pH gradient strips (pH 3-11 NL, GE Healthcare, Uppsala, Sweden) and isoelectric focusing (IEF) was performed using an Ettan IPGphorII system (GE Healthcare) at 20°C up to a total of 88,600 Vh. At the completion of IEF, equilibrated strips were transferred onto the top of a 12.5% uniform SDS-PAGE gel using a vertical Ettan DALTsix system (GE Healthcare). After migration, gels were stained with Coomassie Brilliant Blue G-250 (Bio-Rad). The gels were scanned using an UMAX ImageScanner (GE Healthcare) and spot detection and quantification were performed by image analysis (Melanie 2D gene analysis software V7.0; Swiss Institute of Bioinformatics, Lausanne, Switzerland). Artifacts and saturated spots were removed from the analysis. Spots were matched across all 64 samples and 1,025 valid spots were successfully matched between gels. For each gel, each spot volume was expressed as a percentage of the volume of all matched spots on a given gel. Data were further log2 transformed. Representative 2D electrophoresis gels are shown in supplemental Files S1 and S2.
Statistical Rationale of Spot Selection-Protein identification is a really a posteriori time-consuming work. Statistics were done in a complete blind manner and about 200 spots were expected for mass spectrometry identification. Different statistical methods were chosen to obtain a wide spectrum of protein expression including differential analyses for depicting biological processes and discriminant analyses to identify possible markers. The purpose of each method was not the same. The analyses of variance were performed to get differentially expressed spots according to the gestational time points and the fetal genotypes. Random forests and sPLS-DA were used to find supplementary spots with a predictive power for gestational stages and/or fetal genotypes. Finally, several statistical multivariate methods (sPLS and sCCA) were also performed to find additional spots correlated with phenotypic biological characters of interest such as MyHC profiles. These methods included analyses of variance to detect spots significantly affected by gestational time points and fetal genotypes, in an additive or nonadditive manner, as described in (18). Briefly, to analyze jointly differences between breeds and gestational ages, the following mixed linear model was first fitted to each spot: Y ijk ϭ ϩ A i ϩ FG j ϩ A.FG ij ϩ S k ϩ ⑀ ijk , with i ⑀ {d90, d110}; j ⑀ {LW, MS, MSLW, LWMS}, k ϭ 1…18, S k ϳ ᏺ(, 2 e ) independent and identically distributed (iid) and ⑀ ijk ϳ ᏺ(0, 2 e ) iid. S k and ⑀ ijk are mutually independent. Y ijk is the expression of the spot being studied, a general mean of the considered gene expression and ⑀ ijk is a residual error. This model includes two fixed effects and their interaction: A i is the effect of fetal gestational age i, FG j the effect of fetal genotype j and A.FG ij the interaction effect between gestational age i and genotype j. S k represents the random sow effect. A F-type test was achieved by comparing the complete model and the reduced model: Y ijk ϭ ϩ S k ϩ ⑀ ijk . A correction for multiple testing was then implemented using False Discovery Rate (FDR). The list of differentially expressed spots (DES) was thereafter partitioned into 4 submodels. Submodel 1 combined the two fixed effects and their interaction. Submodel 2 involved the two fixed effects in an additive manner. Submodel 3 included only the fetal gestational age effect, whereas submodel 4 contained only the fetal genotype effect. All models included the random sow effect. The Bayesian information criterion (BIC) was used to associate each DES with one of these four submodels. Only spots from submodels 1 and 2 were kept for identification.
Second, random forest (RF) (32) and sparse partial least square discriminant analysis (sPLS-DA) (33)) were performed to find supplementary spots with a predictive power for gestational stages and/or fetal genotypes. Several RFs analyses were conducted. We computed a classification RF according to the experimental design and we also performed some regression RFs using phenotypes of interest (such as embryonic or adult fast MyHC). In each case, we selected spots with a high stability according to several importance criteria (Gini and Accuracy indexes for the classification RF, Accuracy and the mean decrease in MSE for regression RFs). We performed 20 RFs and kept the first fifteen spots according to those importance criteria to evaluate the stability of the variable selection procedure. This allowed us to assess how often a given spot, selected with our parameters/ criteria, is selected when running several RF procedures. In sPLS-DA, we chose to keep two axes and 25 spots by axis. The first axis discriminated the fetuses according to the gestational age, whereas the second one discriminated fetuses according to the fetal genotype. No parameter was tuned using the mixOmics R functions because we wanted to keep about 50 spots from this specific analysis and only two axes based on the PCA obtained using the normalized proteome data set.
Finally, several statistical multivariate methods (sparse partial least square (sPLS) (34) and sparse canonical correlation analysis (sCCA) (35)) were also performed to find additional spots correlated with phenotypic biological characters of interest such as MyHC profiles, plasmatic values and muscle glycogen content. In sPLS analysis, 20 and 30 spots were kept according to axis 1 and 2, respectively. In sCCA, about 50 spots were also kept to get as many spots as with sPLS analysis. Like in sPLS-DA, the first axis discriminated the fetuses according to the gestational age and the second one according to the fetal genotype. In total, 179 spots were selected and manually excised from preparative gels loaded with 600 g of proteins from pooled samples for further identification by nano-LC-MS/MS, as described in (36).
An R script is provided as additional file to allow outside labs to reproduce all these analyses (supplemental File S3). Raw data used in the R script can be found in supplementary files gel2D_phenoData_ Voillet_2016.csv and gel2D_Voillet_2016.csv.
LC-MS/MS Analyses of Selected Spots-Sequencing grade modified trypsin (Promega, ref. V511A) was used to generate peptides. In-gel tryptic digestion and protein identification by mass spectrometry were performed at the proteomic facilities in Clermont-Ferrand (PFEMcp, INRA, Clermont-Ferrand Theix, France). Tryptic peptides were analyzed by LC-MS/MS using nano-LC system Ultimate 3000 TM RSLC (Dionex, Voisins le Bretonneux, France) coupled on-line to an LTQ VELOS mass spectrometer (ThermoFisher Scientific, Courtaboeuf, France) operated in a CID top 5 mode (i.e. one full scan MS and the five major peaks in the full scan were selected for MS/MS). For database searches and protein identification, Thermo Proteome Discoverer 1.4 software was used with Mascot (Mascot server v2.2, http://www.matrixscience.com) to submit MS/MS data to the Swis-sProt sequence database restricted to Sus scrofa (UniP Sus scrofa, 26127 sequences, June 6th, 2014 release). The following parameters were chosen for the searches: the mass tolerance for parent and fragment ions was set to 1.5 Da and 0.5 Da, respectively, and a maximum of two missed cleavages was allowed. Variable modifications were methionine oxidation (M) and carbamidomethylation (C) of cysteine. Results were scored using the probability-based Mowse algorithm, where the protein score is Ϫ10*log(P) and P is the probability that the observed match is a random event. Protein identifications were considered valid if at least two peptides with a statistically significant Mascot score Ͼ 36 were assigned, and at least 20% sequence coverage was required. The accuracy of the experimental to theoretical isoelectric point and molecular weight were also considered. Among the 179 selected spots, 120 spots, corresponding to 89 unique proteins, were successfully identified (supplemental Files S4 and S5). Raw mass spectrometry data have been deposited in the publicly accessible repository MassIVE site (massive.ucsd.edu) with the dataset identifier MSV000081117.
SDS-PAGE and Immunoblotting-An aliquot of extracted proteins as used in the proteomic analysis was denatured in 1X Laemmli solution (Sigma-Aldrich, St. Louis, MO, USA, ref S3401) at 95°C during 5 min. Equal amounts of protein (30 g per lane) were loaded onto 1.5 mm thick polyacrylamide gels (4 and 10% acrylamide in the stacking and resolving gels, respectively) set on a Bio-Rad Mini-Protean tetra electrophoretic system as previously described (37). After migration in 1X running buffer (Bio-Rad, Marnes-la-Coquette, France, ref. 161-0772), separated proteins were transferred overnight at 4°C to polyvinylidene fluoride membranes at 30 V using 1X transfer buffer (Bio-Rad, ref. 161-0771). Membranes were blocked for an hour in 1X TBS-T with 5% blocking agent (Bio-Rad, ref 170 -6404) at room temperature, and immunoblotted overnight at 4°C with primary monoclonal antibody specific for CKMT2 (Abcam, Cambridge, MA, ab131179 1:1,000 dilution). Blots were washed five times for 5 min with TBS-T before incubation for 1 h at room temperature with the anti-mouse (from sheep) immunoglobulin G horseradish peroxidase conjugated secondary antibody (GE Healthcare, 1:5000 dilution). Finally, the membranes were washed with TBS-T, followed by development using an Enhanced Chemiluminescence Western blotting kit (ECL-Prime, GE Healthcare). Images were scanned with an Im-ageQuant LAS 4000 (GE Healthcare) and protein bands were quantified using the ImageQuant TL program. Membrane normalization was based on a common sample repeated on every gel in three replicates and CKMT2 expression was expressed as arbitrary units (AU) based on the ratio between CKMT2 signal of a given experimental sample and CKMT2 signal of the common sample within the membrane. Beta-actin and tubulin could not be used as normalizing factor because their expression was highly variable and influenced by the experimental factors.
RNA Preparation and Gene Expression Data Set-Total RNA was isolated from each of the 64 muscle samples as previously described in Voillet et al. (18). After quality control and quantile normalization steps, 61 microarrays containing 44,368 probes were kept. The raw and normalized data are available in the Gene Expression Omnibus under the accession number GSE56301.
Integration of Proteome Network Data-Network analysis was performed using a three step approach as illustrated in Fig. 1 and at the two developmental time points separately (90 and 110 dg (32 individuals per condition)), resulting in two global networks. We chose to infer two networks to avoid the high gestational age effect already observed in a previous study analyzing a transcriptomic data set with the same individuals (18). All analyses were performed using the R computing environment (39).
The first step consisted in inferring a proteome co-expression network using the PCIT (partial correlation and information theory) algorithm (40) (Fig. 1A). PCIT has already been successfully used in transcriptomic analyses (41,42). In the present study, we performed PCIT on a proteomic data set. PCIT belongs to the family of weighted network algorithms and is based on the combination of the concept of partial correlation coefficient and the information theory to identify meaningful associations. Briefly, the co-expression arrangements for all triplets are compared, with all triplets being exhaustively explored. PCIT estimates the correlation for each pair of proteins taking the presence of a third protein into account. Significant correlations establish an edge in the reconstruction of the network. Even though PCIT is a soft-thresholding method, the number of selected edges was also adjusted according to the network density (around 5% for each network) to obtain readable networks. This first step of the analysis was performed with the R package PCIT (43).
The second step consisted in network clustering to find communities within networks (Fig. 1B). For each network, a spinglass model was performed to optimize the modularity Q (44) and to cluster nodes (45). The modularity Q is a measure of the quality of communities (or clusters) in a network: highly connected genes within each community and weakly connected genes between communities (44). As already described in (46), a permutation test (100 permutations of edges corresponding to 100 random networks with the same degree distribution) was used to declare whether or not the clustering was significant.
Proteome Network Community Analysis and Relationship with a Phenotype of Interest-The third step consisted in the biological analysis of each community (or subnetwork). GeneMANIA networks (47) were used to explore and confirm the relevance of the proposed communities. Gene ontology (GO) enrichment analysis using the Ge-neCodis webservice (48) was used to identify the biological functions represented by each community (Fig. 1C). For the significance of GO enrichment, multiple testing was controlled using the false discovery rate (FDR) approach (49). The third step included highlighting some nodes in each community. As already reported in other studies (18,46), the betweenness of nodes within each community was analyzed (Fig. 1C). A permutation test with 1000 permutations was performed to check if the betweenness centrality was significant with respect to the node's degree. A significant result (FDR Ͻ 0.05) indicated that a node was more central in the network than expected. Therefore, betweenness centrality is a good measure of the importance of the node in the network. A node with a high betweenness (or centrality) value has a marked influence on the structure of the network. All these analyses were performed using the R package igraph (50). If no node with significant betweenness was found, the nodes with the highest betweenness were highlighted. In an integrative strategy, phenotypic information was also added to the co-expression network. Links between subnetworks and biological phenotypes of interest were investigated (Fig. 1C). To this end, methods coming from spatial statistics were used as described in (13,51). First, the correlation between each protein expression community and the phenotype of interest was calculated. Second, a Moran's I was calculated to measure the spatial correlation between the subnetwork structure and the phenotype of interest. Then, to check if the correlation between the biological phenotype and the subnetwork was significant (p Ͻ 0.05), a permutation test with 1000 node permutations was computed. All the graphs were laid out using the Force Atlas layout (52). The degree (the number of adjacent edges) is indicated by the size of the node, and betweenness is indicated by the color of the node. The colors of the edges show whether the correlation between nodes is positive (in red) or negative (in blue). All these analyses were performed using Gephi software (53).
Gene-Protein Integration-Among the 89 unique proteins identified, 81 corresponding genes were available in the muscle transcriptome data set (18). When multiple probes mapped to the same gene, the highest differentially expressed probe according to the interaction between gestational time points and fetal genotypes was retained. Transcriptomic and proteomic analyses were carried out using the same 64 muscle samples, but 60 fetuses were available in both proteomic and transcriptomic data sets. To identify proteins with possible transcriptional regulation, for each protein within each fetal genotype, a Pearson's correlation was computed between mRNA (from the gene expression data set (18)) and protein expression levels. A test was also run to assess if the correlation was significantly different from zero (p Ͻ 0.01). Multiple testing was controlled using the FDR approach (49).
From a biological point of view, the Ingenuity Pathway Analysis (IPA, Qiagen Redwood City, www.qiagen.com/ingenuity) software was used to check enrichment analysis (biological functions and canonical pathways), to construct bibliographic networks and regulation networks based on the identification of potential upstream regulators. A focus was given to subnetwork 2 at 110 dg, because of its relevant spatial correlation with two biological phenotypes of interest and a high number of proteins with a possible transcriptional regulation. Briefly, IPA constructed two separated networks. The first one was based on bibliographic data in which the edges were obtained from biological links such as receptor-ligand interactions, enzyme activity on another protein, or a transcriptional factor activating the expression of targeted genes. IPA proposed the most probable network with an associated score. IPA also identified upstream regulators with a statistical likelihood of targeting some of the genes or proteins in the network. Finally, IPA generated a regulated network with the latest information. The proposed network (described in the Results section) was reconstructed from both IPA networks by integrating direct and regulated links. The PathDesigner function of IPA was used to draw a final graph with all previous information plus the information related to the correlation between transcripts and proteins when the information was available from the transcriptomic data.

Analysis of MyHC Polymorphism and Muscle Glycogen
Content-MyHC mRNA and protein levels and muscle glycogen content are listed in Table I. All these biological phenotypes were differentially expressed according to the interaction between the two developmental time points (90 and 110 dg) and the four fetal genotypes (MS, LW, MSLW, and LWMS), except slow (I ϩ ␣-cardiac) MyHC at the protein level. The expression of IIb and extraocular MyHC were undetectable at 90 and 110 dg. In all four genotypes, embryonic and perinatal MyHC (mRNA and protein levels) decreased at the end of gestation, whereas fast (IIa ϩ IIx ϩ IIb) and slow (I ϩ ␣-cardiac) MyHC increased during the maturation process. Embryonic MyHC was more highly expressed in LW than in MS fetuses at 90 dg, whereas a higher value of fast (IIa ϩ IIx ϩ IIb) MyHC was observed in MS than in LW at 110 dg. A high correlation between mRNA and protein levels was found for both embryonic and adult fast MyHC (Pearson's correlation between mRNA and protein levels equal to 0.95 and 0.93, respectively) (Fig. 2).
The profile of muscle glycogen content resembled that of adult fast MyHC between 90 and 110 dg with a 2.6-fold increase during the maturation process (Table I). At 90 dg, MS and LWMS fetuses already exhibited higher muscle glycogen content than LW and MSLW fetuses. At 110 dg, levels reached 12.2% in MS and LWMS versus 10.5% and 9.5% in MSLW and LW, respectively (p Ͻ 0.001). Fig. 2 shows the corresponding changes in muscle glycogen content and both embryonic and adult fast MyHC (IIa ϩ IIx ϩ IIb). Muscle glycogen content was particularly well correlated with embryonic MyHC at 90 dg and adult fast (IIa ϩ IIx ϩ IIb) MyHC at 110 dg. The 64 fetuses used for this analysis were the same fetuses as those used in the proteome and transcriptome analyses. We considered eight conditions according to two developmental time points (d90 and d110) and four fetal genotypes (MS, LW, MSLW and LWMS). 1  Because MS piglets are reported to be more mature than LW piglets at birth (15), embryonic MyHC, fast (IIa ϩ IIx ϩ IIb) MyHC and muscle glycogen content are likely good descriptors of muscle physiological maturity in newborn pig. These biological phenotypes were consequently used in the proteomic and transcriptomic analyses to help identify proteins and genes potentially involved in the skeletal muscle maturation process in late fetal stages.
Multiple Statistical Analyses to Select Potentially Relevant Protein Spots-As described in Materials and Methods, several statistical methods were conducted to obtain a large spectrum of proteins related to the experimental design. One hundred seventy-nine spots were selected among the 1025 spots matched between gels. After protein identification, 120 spots, corresponding to 89 unique proteins, were successfully identified by LC-MS/MS and 18 proteins exhibited different isoforms. To facilitate data interpretation and computation, protein isoforms were filtered to exclude redundant isoforms (i.e. highly positively correlated isoforms). When multiple highly positively correlated isoforms (Pearson's correlation Ͼ 0.9) mapped to the same protein, only the most differentially expressed isoform for the interaction between developmental time points and fetal genotypes was retained. Isoforms without a high positive correlation were kept (Pearson's correlation Ͻ 0.9). Finally, a total of 113 proteins were retained and used for further analyses (supplemental File S5).
The distribution of the 113 identified spots among the different statistical analyses is illustrated as a Venn diagram in Fig. 3. Seventy-one spots were significantly differentially expressed of which 13 were affected by the interaction and 58 by the additive model between gestational time points and fetal genotypes. Using RF and sPLS-DA analyses to obtain spots that distinguished gestational stages and/or fetal genotypes, 55 spots were selected. Then, 62 spots were also selected using several multivariate methods to obtain spots correlated with the biological phenotypes of interest (MyHC polymorphism and glycogen content). Altogether, many spots overlapped between the multivariate and discriminant analyses, so that 179 spots were finally selected. After spot identification by mass spectrometry and some isoform removing, 113 proteins were identified.
The classification of the 113 identified proteins according to the biological axes is presented as a level plot in Fig. 4. Proteins involved in energy metabolism processes, such as glucose metabolism, gluconeogenesis, oxidation-reduction activity and mitochondrion, were mostly overexpressed at 110 dg in all genotypes. The figure also shows that oxidationreduction related to mitochondria at 110 dg increased in the order LW Ͻ MSLW Ͻ LWMS Ͻ MS. On the other hand, proteins involved in muscle development, such as system development, actin skeleton, muscle filament sliding, cytoskeleton and mRNA metabolic process were mostly overexpressed at 90 dg in all genotypes. All enriched biological functions and cellular components are not shown in this figure because we chose to highlight only important GO terms with FIG. 1. Overview of the network data integration analysis. A, Proteome networks were first inferred using the PCIT (partial correlation and information theory) algorithm for each developmental time point: 90 days of gestation (dg) and 110 dg. PCIT, which belongs to the family of weighted network algorithms, is based on the combination of the concept of partial correlation coefficient and the information theory to identify meaningful associations. B, Network clustering was then performed using a spin-glass algorithm. A permutation test was performed to assess the significance of clustering. C, Sub-networks (clusters) were analyzed. (i) GeneMANIA was used to assess the relevance of the proposed subnetworks and GO enrichment analysis was performed to analyze the biological functions of the subnetworks. (ii) Centrality (or betweenness) of nodes was analyzed using a permutation test. (iii) Using spatial statistics tools, correlations between subnetworks and biological phenotypes of interest were analyzed. a high number of proteins. For that reason, some proteins do not belong to any of the biological processes shown.
Proteome Subnetwork Analysis-Network inference was performed at each developmental time point according to the three-step inference method presented in Fig. 1. The largest connected component of the d90 and d110-proteome networks are presented in supplemental File S6 and characteristics of these networks (degree, betweenness and clustering) are summarized in supplemental File S7 and 8, respectively.
At 90 dg the largest connected component of the d90proteome network was composed of 94 nodes and 314 edges (density of 7.2%) and followed a scale-free topology denoting a nonrandom organization of the network (10). After node clustering, five subnetworks were obtained (Table II and Fig.  5A). All five subnetworks displayed between 16 -21 nodes and 26 -55 edges. Supplemental File S9 displays all five subnetworks, and supplemental File S10 shows the enriched Gene Ontology (GO) terms for each subnetwork. In several subnetworks, some isoforms for the same protein were linked because only one developmental time point was used for the network inference. Fig. 5A shows that subnetworks 1, 2, and 4 had a high number of edges in common, and shared a GO term corresponding to muscle filaments. Notably, these three subnetworks were also those that were significantly and spa- tially correlated with the three biological phenotypes of interest (Table II). Fig. 6 shows the three subnetworks (subnetworks 1, 2, and 4) that were significantly and spatially correlated with the biological phenotypes of interest. Subnetwork 1 was correlated with glycogen and embryonic MyHC, subnetwork 2 to glycogen and adult fast (IIa ϩ IIb ϩ IIx) MyHC, whereas subnetwork 4 was only correlated with embryonic MyHC.
According to GO functional enrichment analysis, each subnetwork was related to several biological functions (Table II): subnetwork 1 was mainly related to myofibril (GO cellular component (CC)), glycolysis (GO biological process (BP)) and mitochondrion (CC), subnetwork 2 to actin filament (CC), gluconeogenesis (BP) and mitochondrion (CC), and subnetwork 4 to muscle filament sliding (BP), gluconeogenesis (BP) and cell cycle checkpoint (BP). Sub-network 3 was mainly related to the respiratory electron transport chain (BP), cell redox homeostasis (BP) and mitochondrion (CC), whereas subnetwork 5 was related to the creatine metabolic process (BP) and sarcolemma (CC). It is important to note that some identical GO terms were enriched in several subnetworks because many proteins identified exhibited different isoforms and these isoforms could be present in different subnetworks. In addition, the proteins we identified could also be involved in several metabolic functions.
To go deeper into the biological interpretation of the clusters, we chose to highlight the three subnetworks correlated with the biological phenotypes of interest ( Fig. 6 and Table II). Sub-network 1 showed a significant correlation with muscle glycogen content and embryonic MyHC. The enriched GO biological processes, such as glycolysis, glucose metabolic process and gluconeogenesis, agreed with the glycogen correlation. FETUB (Fetuin-B, a member of the cysteine protease inhibitor family -involved in the negative regulation of endopeptidase activity) and OXCT1 (Succinyl-CoA:3-ketoacid-coenzyme A transferase 1, mitochondrial -involved in ketone body catabolic process) exhibited the highest degree in this subnetwork. In addition, FETUB exhibited the highest betweenness and was more highly expressed in LW than in MS fetuses. Muscle filament sliding was also one of the enriched GO terms describing this subnetwork. In subnetwork 2, one isoform (isoform 2) of PSMC5 (Proteasome 26S Regulatory Subunit, ATP-dependent degradation of ubiquitinated proteins, 5) had significantly high betweenness and the highest degree in this subnetwork. This isoform of PSMC5 was more highly expressed in MS than in LW fetuses (Fig. 7). Three isoforms (isoforms 1, 2, and 4) of GPD1 (Glycerol-3-phosphate dehydrogenase 1, cytoplasmic, involved in the glycerol phosphate shuttle) were also identified in this subnetwork. Interestingly, these three isoforms were more highly expressed in MS than in LW fetuses, whereas GPD1 isoform 3 was less expressed in MS than in other genotypes (Fig. 7). Muscle filament sliding was also identified as a significant GO term to characterize subnetwork 2. Sub-network 4 was correlated with the embryonic MyHC. One isoform (isoform 2) of TNNT3 (Troponin T type 3) had significantly high betweenness and the highest degree in this subnetwork. The enriched GO biological processes, e.g. muscle filament sliding and development and cell cycle checkpoint, were in agreement with the values of the phenotypic correlation. Two other TNNT3 isoforms (isoform 1 and 3) were also present in this subnetwork. It is noteworthy that a common GO term applied to subclusters 1, 2 and 4 related to muscle filaments, which could denote a strong involvement of muscle filaments in the maturational process at 90 dg, as previously highlighted on the level plot in Fig. 4.
At 110 dg, the largest connected component of the d110proteome network was composed of 86 nodes and 313 edges (density of 8.6%) and had scale-free topology denoting a nonrandom organization as observed at 90 dg. We obtained six clusters (Table III and Fig. 5B). Supplemental File S11 shows all six subnetworks and supplemental File S12 shows the enriched GO terms for each subnetwork. Among these six subnetworks, five subnetworks displayed more than 10 nodes and 19 edges. Four subnetworks were spatially correlated with muscle glycogen content (subnetworks 3, 4, 5, and 6), whereas three subnetworks were spatially correlated with the embryonic MyHC (subnetworks 2, 3, and 4) or the adult fast MyHC (subnetworks 2, 3, and 5).
GO term enrichment analysis was performed for each subnetwork (Table III) and was primarily involved in the tricarboxylic acid cycle (BP), respiratory electron transport chain (BP), glucose metabolic process (BP) and muscle filament sliding (BP), (3) subnetwork 3 was involved in gluconeogenesis (BP) and glycolysis (BP), (4) subnetwork 4 in the mRNA metabolic process (BP) and in the proteasome complex (CC), (5) subnetwork 5 in muscle development (BP) and also lipid metabolism (BP), and (6) subnetwork 6 was mainly involved in gluconeogenesis (BP), ATP catabolic/anabolic process activity (BP) and mitochondrion (CC).
As previously, we chose to highlight the five subnetworks correlated with a biological phenotype of interest (Fig. 8). Sub-network 2 was correlated with both adult fast and embryonic MyHC and had the highest number of nodes (24) and edges (65). This subnetwork was primarily characterized by GO terms corresponding to mitochondrial oxidative metabolism (supplemental File S12). ATP5A1 (ATP synthase subunit alpha, mitochondrial) showed significant high betweenness and CKMT2 (Creatine kinase, mitochondrial 2) exhibited the highest degree of the subnetwork. Both CKMT2 and ATP5A1 were more highly expressed in MS than in LW fetuses (Fig. 7). These proteomic results were validated for CKMT2 by Western blotting carried out on the 64 fetuses (Fig. 9). Briefly, CKMT2 increased dramatically between 90 and 110 dg and was greater in pure MSMS fetuses at 110 dg, whereas hybrid MSLW and LWMS were intermediate between pure LWLW and pure MSMS fetuses. This was also supported by enzyme activities with a greater increase in activities of CS and HAD between 90 and 110 dg in pure MSMS fetuses (ϩ283 and ϩ122% for CS and HAD, respectively) than in pure LWLW fetuses (ϩ172% and ϩ87%, respectively) (Fig. 10). It is worth noting that the differential increase between MSMS and LWLW fetuses was more important for CS than HAD activities. Besides that, the activity of LDH did not differ between MSMS and LWLW fetuses at 90 dg, remained stable in LWLW up to 110 dg and increased moderately (ϩ31%) in MSMS fetuses (Fig. 10). Sub-network 3 was significantly correlated with all three phenotypes of interest and was mainly involved in glucose metabolic process, gluconeogenesis and glycolysis occurring in the cytoplasmic compartment. Gelsolin (GSN), involved in actin filament reorganization, was the protein with the highest betweenness, but also the highest degree of this subnetwork with PDIA3 also known as GRP58 (protein disulfide-isomerase A3-involved in protein folding). Sub-network 4 was correlated with the embryonic MyHC and muscle glycogen content and mostly involved in the proteasome complex and the mRNA metabolic process. An isoform of PSMC5 (isoform 1) had the highest betweenness. This protein and DDAH1 (Dimethylarginine dimethylaminohydrolase-involved in arginine metabolic process) had the highest degree of this subnetwork. Muscle glycogen content and adult fast MyHC were the two biological phenotypes correlated with subnetwork 5. PGAM1 (Phosphoglycerate Mutase 1-involved in the glycolytic process) was the protein with the highest degree, whereas isoform 2 of LDB3 (LIM domain binding 3-involved in sarcomere organization) had the highest betweenness. Sub-network 6 was significantly correlated with muscle glycogen content, which is consistent with the enriched GO terms (e.g. gluconeogenesis and ATP catabolic/ anabolic process). An isoform of PSCM5 (isoform 2) had the highest betweenness and the highest degree of this subnetwork. Thus, PSMC5 isoform 2 had a major influence on the structure of subnetwork 6 at 110 dg and subnetwork 2 at 90 dg, whereas PSCM5 isoform 1 played a key role in subnetwork 4 at 110 dg. Two isoforms (isoform 2 and 4) of GPD1 were also present in subnetwork 6 and under-expressed in LW fetuses (Fig. 7).  Identification of mRNAs Correlated with Proteome Networks-For each protein with a corresponding gene in the transcriptomic data set, a Pearson's correlation was computed between mRNA and protein expression levels within each extreme fetal genotype in order to identify proteins with possible transcriptional regulation. The transcriptomic data set was previously used by Voillet et al. (18) and 60 fetuses were represented in both the transcriptomic and proteomic data sets. Fig. 11 shows the correlation between transcriptomic and proteomic data in MS (x axis) and LW (y axis) genotypes. Among the 81 proteins available in the transcriptomic data set, 31 proteins exhibited a significant positive correlation between mRNA and protein expression levels in both extreme fetal genotypes, 8 proteins in LW only (mostly involved in muscle filament sliding and located in cytosol) and 13 proteins in MS only (with no specific biological enrichment) ( Fig. 11 and supplemental File S13). On the other hand, one protein and its associated isoform (TNNT3 isoform 6) was found to have a significant negative correlation between mRNA and protein expression levels in both extreme genotypes, two proteins in LW only (TNNT3 isoform 1 and LDB3 isoform 1) and two proteins (TNNT3 isoform 7 and GSN isoform 1) in MS only (supplemental File S13).
For a deeper biological interpretation, we focused on the 31 proteins with a significant positive correlation between mRNA and protein expression in both MS and LW pure fetuses, and subnetwork 2 at 110 dg because of its relevant spatial correlation with two biological phenotypes of interest (adult fast and embryonic MyHC). Thanks to the correlation analysis, we showed that 10 nodes (seven belonging to the mitochondrion cellular component) among the 24 nodes of subnetwork 2 exhibited a positive correlation between mRNA and protein expression and were likely transcriptionally regulated (supplemental File S13), especially CKMT2 and ATP5A1. An IPA analysis of this subnetwork (using all nodes) identified possible transcription factors (TFs) affecting these proteins, in particular PPARGC1A (peroxisome proliferator-activated receptor gamma coactivator 1-alpha) with an effect on CKM (creatine kinase, muscle, cytoplasm), ATP5A1, CKMT2, and ACADVL (very long-chain specific acyl-CoA dehydrogenase, mitochondrial -involved in fatty acid beta-oxidation) (Fig. 12). The PPARGC1A gene (available in the transcriptomic data set) was overexpressed at 110 dg compared with 90 dg. It is interesting to note that PPARGC1A expression was lower in LW than in MS at 90 dg. This difference between genotypes, only observed at 90 dg, suggests that the increase in PPARGC1A expression was delayed at 90 dg in LW, which could have subsequently reduced protein expression of CKM, ATP5A1, CKMT2, and ACADVL at 110 dg in LW versus MS fetuses. ESR1 (Estrogen receptor alpha) was also identified as a possible regulator through its actions on proliferation and differentiation of muscle cells via MACROD1 (MACRO Domain Containing 1), PDIA3 (isoform 2), SEPT2 (Septin 2), ANXA2 (Annexin A2) and GSN (Fig. 12). The ESR1 gene (also available in the transcriptomic data set) was overexpressed at 110 dg with a higher expression in MS than in LW at both 90 and 110 dg (see insert in Fig. 12). IPA identified KCNJ11, a subunit of FIG. 6. Three subnetworks obtained at 90 days of gestation. A PCIT model was used to infer a co-expression network. Nodes were clustered using a spin-glass model. The figure shows three (out of five) subnetworks (1, 2, and 4) obtained by clustering. The color of the nodes corresponds to betweenness centrality with red nodes exhibiting the highest betweenness. The size of the nodes indicates degree. The color of the edges corresponds to the correlation sign: red and blue for positive and negative correlations, respectively. the ATP-sensitive K ϩ channel (KATP), as a regulator of many proteins involved in energy metabolism that were up-regulated between 90 and 110 dg and overexpressed in MS com-pared with LW at 110 dg. Expression of KCNJ11 at the mRNA level increased between 90 and 110 dg but did not differ between genotypes (see insert in Fig. 12). However, KCNJ11  8. Five subnetworks obtained at 110 days of gestation. A PCIT model was used to infer a co-expression network. Nodes were clustered using a spin-glass model. This figure shows five (out of six) subnetworks (2, 3, 4, 5, and 6) obtained by clustering. The color of the nodes corresponds to betweenness centrality with red nodes exhibiting the highest betweenness. The size of the nodes indicates degree. The color of the edges corresponds to the correlation sign: red and blue for positive and negative correlations, respectively.

TABLE III Network characteristics at 110 days of gestation
Six subnetworks (communities) were obtained at 110 days of gestation. a Nodes with a high betweenness. b The phenotype correlations with MyHC embryonic (Emb.), adult fast and muscle glycogen content were analyzed using spatial statistics tools. c Additional File 12 shows all significantly enriched GO terms for each sub-network. * Significantly high (P Ͻ 0.05) betweenness using a permutation test. is not a transcription factor but is involved in the regulation of K ϩ ions in response to the ATP/ADP ratio (54) and cannot be considered as a potential upstream transcriptional regulator of muscle maturity. Finally, IPA analysis identified several myogenic factors whose expression decreased between 90 and 110 dg, and was lower in MS than in LW at 110 dg (see insert in Fig. 12), in accordance with better maturity of MS piglets at birth.

Embryonic MyHC, Adult Fast (IIa ϩ IIx) MyHC and Muscle Glycogen Content Are Good Descriptors of Neonatal Muscle
Maturity-In the present experiment, the adult fast IIa and IIx MyHC were already expressed before birth, whereas no IIb was detected, which is in accordance with results obtained by (55). In a previous study, we also found that ␣-cardiac MyHC was transitorily expressed in pig skeletal muscle shortly after birth (29), and the present results show that the ␣-cardiac MyHC is already expressed in late gestation. In all genotypes, embryonic and perinatal MyHC (mRNA and protein levels) decreased at the end of gestation, whereas fast (IIa ϩ IIx) and slow (I ϩ ␣-cardiac) MyHC increased during the maturation process. These results are consistent with a previous review (56) dealing with the ontogenesis of skeletal muscle in farm species. In addition, because of the high correlation between mRNA and protein levels (Fig. 2), our results suggest a transcriptional regulation of MyHC expression during the maturation process. Lefaucheur et al. (57) already showed that expression of adult MyHC was very similar at the mRNA and the protein levels in pig skeletal muscle at 60 kg body weight. This transcriptional regulation has also been observed for other MyHC isoforms and confirms the regulation of MyHC gene expression at a transcriptional level (58).
The maturation of skeletal muscle contractile and metabolic properties has been reported to influence piglet maturity at birth (59,60). Among the MyHC phenotypes, embryonic and adult fast (IIa ϩ IIx) MyHC appear to be good markers of muscle maturity at 90 and 110 dg, respectively (Table I). Indeed, because embryonic MyHC was highly expressed at 90 dg and decreased drastically in late gestation, it is likely a good marker of muscle immaturity at 90 dg (easily measurable because highly expressed), and its higher expression in LW than MS points to a lower maturity of LW than MS fetuses at 90 dg. Conversely, fast (IIa ϩ IIx) MyHC was highly expressed at 110 dg and increased dramatically in late gestation, making it a likely good descriptor of muscle maturity around birth, and its higher expression in MS than in LW fetuses also points to a higher maturity of MS than LW fetuses at 110 dg. Interestingly, the hybrid fetuses were mostly intermediate between pure genotypes. In pigs, the adult fast IIa, IIx and IIb MyHC progressively replace the developmental MyHC (embryonic and perinatal MyHC) in late gestation and early after birth (56). Because MS neonates are known to be more mature than LW, with a lower rate of mortality at birth (15), the lower expression of embryonic MyHC at 90 dg and the higher expression of adult fast (IIa and IIx) MyHC at 110 dg are representative of a better muscle maturity in MS than in LW fetuses in late gestation, the hybrid fetuses being in an intermediate position.
In addition, a good correspondence was found between the increased expression of adult fast (IIa ϩ IIx) MyHC and muscle glycogen content between 90 and 110 dg (Fig. 2). Energy reserves, e.g. glycogen and lipids, must be maximal in the neonatal period because piglets are not able to oxidize protein efficiently before 5-7 days of life (59). In pig neonates, muscle glycogen content is very high (about 9% of fresh muscle weight versus 1% in growing pigs (59)) and plays a key role in whole body glycogen storage (89% of total body glycogen at birth (59)) and thermoregulation because neonatal pigs are devoid of brown adipose tissue and only have a small amount of adipose tissue (about 1.5% of body weight). The animal's energy requirement is maximum in the neonatal period to promote locomotion, thermoregulation and growth. Therefore, a high level of muscle glycogen at birth is likely involved positively in neonatal maturity and piglet survival. The higher muscle glycogen content found in MS fetuses in the present study is in agreement with the better maturity previously reported in MS compared with LW neonates (15). Interestingly, a difference between the two extreme breeds in expression of genes involved in glycogen metabolic processes was already observed during the same fetal period in the same individuals by a transcriptomic approach (18). Thus, some genes were overexpressed at 110 dg in MS only, such as PCK2 (phosphoenolpyruvate carboxykinase 2, mitochondrial also known as PEPCK 2) likely involved in gluconeogenesis from precursors derived from the citric acid cycle (18). Therefore, the significant difference in muscle glycogen content between MS and LW observed in the present study can be, at least partly, explained by these differences in gene expression. Interestingly, MS fetuses also exhibited a greater PYGM (muscle glycogen phosphorylase) expression at the mRNA level than LW at 110 dg, which suggests a higher capacity to use glycogen as an energy source in MS fetuses. Skeletal muscle appears to play a key role in glycogen storage and oxidative metabolism around birth. We hypothesize that piglets with high value for survival, like MS, have a higher ability to maintain glucose levels during and after farrowing, and are more able to maintain body temperature.
Taken together, the present data show that embryonic MyHC, adult fast (IIa ϩ IIx) MyHC and muscle glycogen content are good descriptors of muscle maturity in pig fetuses during late gestation and confirm that MS fetuses are more mature than LW fetuses, whereas hybrid fetuses are intermediate between pure MS and LW breeds. All these three biological phenotypes were used in the further data integration analysis to help find new biological markers of neonatal maturity and to advance our understanding of the underlying mechanisms.
Mitochondrial Oxidative Metabolism is a Key Determinant of Neonatal Muscle Maturity-Oxidative metabolism tends to increase during late gestation in all farm species making it an increasingly important source of energy during fetal life (56). To identify the biological processes underlying muscular maturity, the GO terms classification was performed on the 89 unique identified proteins. Many proteins associated with metabolic processes were found. However, 2D gel electrophoresis is known to reveal only a limited collection of highly abundant and soluble proteins (61) covering a limited number of cell functions, mainly dealing with cell structure and metabolism, which is why proteomic and transcriptomic approaches are not always comparable (62).
Network analysis was performed and computed at the two developmental time points to identify proteins with a relevant role in the network and to identify biological processes. It is important to note that, at both 90 and 110 dg, many biological processes and cellular components overlapped between subnetworks and, within each subnetwork, the enriched biological analysis often highlighted a mixture of GO terms. Indeed, it is sometimes difficult to assign a single biological function to a protein because of an involvement in interdependent metabolic pathways. Moreover, several isoforms were also found in the proteomic data set, and were diversely inter-correlated meaning that two isoforms of the same protein could be found in two different subnetworks. The GO terms overexpressed at 90 dg were mostly involved in the actin cytoskeleton, muscle development and mRNA processing (Fig. 4). As already highlighted in a previous transcriptomic study using the same individuals (18), these results are consistent with the second phase of myofiber genesis known to occur between 55 and 90 dg in pigs (56). The total number of muscle fibers increases up to ϳ90 dg and further muscle development mostly occurs through maturation and hypertrophy of existing muscle fibers (56). In addition, it has already been reported that the total number of myofibers is lower in MS than LW (57) which helps explain the lower postnatal skeletal muscle growth capacity of MS than LW pigs. Among the proteome subnetworks that were correlated with the biological phenotypes of interest (subnetworks 1, 2, and 4), all were characterized by GO terms dealing with muscle filament development and sliding, which underlined the importance of filamentous proteins and cytoskeleton at this stage. Notably, subnetwork 2 also contained three isoforms of GPD1 (isoforms 1, 2 and 4) that were overexpressed in MS fetuses (Fig. 7). GPD1 is a cytoplasmic enzyme involved in the glycerol phosphate shuttle that can transfer cytoplasmic glycolytic reducing NADH equivalents to mitochondrial FADH2 at complex 2, thus providing ATP to the cell through the respiratory chain from complex 2 (63). The higher expression of these three GPD1 isoforms in MS at 90 dg suggest increased mitochondrial oxidation of cytosolic NADH in MS, which could already contribute to the advanced maturity of MS fetuses. Surprisingly, subnetworks 3 and 5, which were mostly involved in energy metabolism, had no significant correlation with MyHC and muscle glycogen content at 90 dg. overall, our data suggest that muscle immaturity at 90 dg is primarily related to a high proportion of cytoskeletal proteins and proteins involved in myofibril assembly, even though some metabolic enzymes such as GPD1 seem to be positively correlated with the maturation process at 90 dg.
At 110 dg, the most striking differences between genotypes concerned the mitochondria cellular component, the mitochondrial oxidation/reduction molecular function (Fig. 4). Moreover, the proteome subnetwork analysis identified gluconeogenesis and glycolysis as important co-expressed pathways which could explain the higher muscle glycogen content in MS than in LW at 110 dg. Analysis of the proteome subnetwork identified five subnetworks that were correlated with the biological phenotypes of interest (subnetworks 2, 3, 4, 5, and 6). Most were characterized by GO terms involved in energy metabolism (subnetworks 2, 3, 5, and 6), whereas subnetwork 4 contained GO terms dealing mostly with the proteasome complex and mRNA processing with isoform1 of PSMC5 as the most important node in this cluster. The most relevant subnetwork was subnetwork 2 with 24 nodes and 65 edges. This subnetwork was highly correlated with adult fast and embryonic MyHC, and its GO terms primarily concerned mitochondrial energy metabolism. Interestingly, CKMT2 and ATP5A1 were identified as important nodes in this cluster, which could be physiologically relevant to produce energy such as ATP. Indeed, CKMT2 is a mitochondrial creatine kinase responsible for the transfer of high energy phosphate from the mitochondria to the cytosolic compartment, and at the same time for returning ADP to the mitochondrial respiratory system, thereby stimulating oxidative phosphorylation. Moreover, cytoplasmic muscle CKM was also overexpressed in MS at 110 dg, suggesting that the energy metabolism related to muscle contraction was greater in MS than LW. ATP5A1 encodes a subunit of the mitochondrial ATP synthase that converts the mitochondrial electrochemical H ϩ gradient to ATP production, thereby supplying ATP to the muscle. CKMT2 and ATP5A1 are likely good biological markers of muscle maturity at 110 dg ( Fig. 7 and Fig. 9). The greater expression of CKMT2 and ATP5A1 in MS at 110 dg indicates that MS fetuses possess greater oxidative capacity, which was supported by the greater enzyme activities of CS (tricarboxylic acid cycle) and HAD (lipid beta-oxidation) (Fig. 10), in accordance with previous results obtained at birth (64). Subnetwork 3 was the second most important cluster with 19 nodes and 46 edges. It was highly correlated with glycogen content as well as embryonic and adult fast MyHC, and its GO terms mostly dealt with glycolytic energy metabolism and gluconeogenesis. Sub-network 6 was highly correlated with muscle glycogen content and one of its GO terms corresponded to gluconeogenesis. Moreover, two GPD1 isoforms (isoforms 2 and 4) were also present in subnetwork 6 and were under-expressed in LW fetuses, as already observed at 90 dg (Fig. 7). Notably, expression of the four GPD1 isoforms was strongly influenced by the genotype with no effect of gestational age, and no interaction between genotype and development time points was observed (supplemental File S5). This suggests that GPD1 is genetically determined but not related to maturity as influenced by age. At the mRNA level, as already demonstrated in (18), GPD1 gene expression was lower in LW than MS at 90 dg and did not increase in LW during the maturation process, whereas it did in MS. Therefore, the gestational time point only influenced GPD1 at the mRNA level in MS. The greater GPD1 expression in MS than LW at 110 dg could be physiologically relevant through a better transfer cytoplasmic NADH to mitochondrial FADH2 (glycerol phosphate shuttle), which could contribute to an enhanced production of energy through the oxidative phosphorylation in the respiratory chain. Interestingly, the mRNA expression level of LDHA, which catalyzes the interconversion of L-lactate and NADϩ to pyruvate and NADH in the final step of anaerobic glycolysis was greater in MS than LW fetuses (18) and the decrease in blood lactate concentration between the afferent fetal vein and efferent fetal artery was greater in MS than LW fetuses (-0.18 Ϯ 0.09 versus 0.05 Ϯ 0.20 mmol/l, respectively, p ϭ 0.01, not shown), suggesting a greater uptake of lactate, mostly produced by the placenta, as an energy source in MS than LW fetuses. At 110 dg, the enzyme activity of LDH was significantly greater in MSMS than LWLW fetuses (ϩ17%, p ϭ 0.04, Fig. 10). However, LDH is a reversible enzyme which can transform either pyruvate to lactate (ϩ NADϩ) or lactate to pyruvate (ϩ NADH) depending on its isozyme composition and substrate availability, making it necessary to perform further work to better understand the real metabolic fluxes through the LDH reaction. Because LDH activity dramatically increases from birth to 15-20 days of age (65), the greater LDH activity in MSMS than LWLW fetuses at 110 dg could also be related to the better maturity of MSMS piglets at birth. Altogether, the present data show that muscle energy metabolism, mitochondrial energy metabolism and muscle glycogen storage, increased dramatically between 90 and 110 dg and were higher in MS than in LW fetuses at 110 dg, meaning that the mitochondrial oxidation/reduction processes and glycogen storage, likely involving the mitochondrial PEPCK (PCK2), play a crucial role in the late fetal muscle maturation process. In accordance with present data, a dramatic increase in muscle oxidative metabolism during late gestation has also been reported in different farm species (56).
Biological Data Integration Identified PPARGC1A as Potential Upstream Transcriptional Regulator of Muscle Maturity at Birth-To gain further insight into the molecular machinery underlying the muscle maturation process in pigs, we undertook a guided and integrated analysis of the transcriptomic and proteomic data sets. Among the 89 unique proteins identified, 81 were available in the transcriptomic data set. Using Pearson's correlations, mRNA and protein expression levels were analyzed to find proteins with possible transcriptional regulation. Thirty-one proteins were identified with a significant positive correlation between mRNA and protein expression levels in both extreme fetal genotypes (Pearson's correlation r Ͼ 0.7) (Fig. 11). Several transcriptome and proteome data integration studies have already been reported e.g. in cell lines (66 -68), plants (69), and mammal models (6, 70 -73). The correlation between mRNA and protein levels has been reported to be notoriously poor. At 110 dg, subnetwork 2 was composed of 10 nodes (out of 24) with possible transcriptional regulation in both MS and LW. Proteins belonging to this subnetwork generally exhibited similar expression profiles at 110 dg, were mostly located in the same cellular component and may possibly be regulated by common transcription factors (TFs). Based on a bibliographic network computed using IPA software (using the upstream regulator function), we were able to identify possible upstream TFs and PPARGC1A was one of them. It was an upstream regulator of CKMT2, ACADVL and ATP5A1 (Fig. 12). CKMT2 has already been identified as an important node in subnetwork 2 (the highest betweenness at 110 dg) and as a likely good biological marker of muscle maturity at 110 dg.
PPARGC1A (also known as PGC1-alpha) is a transcriptional coactivator that controls the expression of many genes through a whole range of nuclear hormone receptors and other TFs (74). Moreover, the activity of PPARGC1A is regu-lated by phosphorylation and deacetylation through the coaction of two upstream metabolic sensors of energy deficiency: AMPK and SIRT1, respectively (75). PPARGC1A is abundant in muscle (76) where it is involved in several biological functions such as mitochondrial biogenesis and oxidative metabolism, which play a key role in ATP production and the adaptation of muscle to exercise and thermoregulation (77). It has been shown to drive the formation of slow twitch fibers (78) and to be more highly expressed in oxidative myofibers (75). Therefore, together with AMPK and SIRT1, PPARGC1A is involved in carbohydrate and lipid metabolisms to maintain energy homeostasis. In post-natal growing pigs, PPARGC1A is more highly expressed in LM of Erhualian than LW pigs, in accordance with a higher proportion of oxidative fibers in Erhualian pigs (79), as well as in MS pigs (57). Voillet et al. (18) showed that PPARGC1A gene expression was differentially expressed depending on the interaction between fetal genotypes and gestational time points, with higher expression in MS than LW at 90 dg. Therefore, our results suggest that PPARGC1A could have an early effect on the subsequent greater increase in expression of genes such as CKMT2, ATP5A1, ACADVL, and CKM in MS than LW at 110 dg. Mitochondrial CKMT2 and cytoplasmic CKM are both involved in the creatine metabolic process and are known to play a key role in the transfer of energy within the muscle fiber for muscle contraction and thermogenesis. Interestingly, overexpression of PPARGC1A in the skeletal muscle of transgenic mice has been shown to increase glycogen synthesis and storage (80), which could also explain the greater glycogen content observed in MS compared with LW fetuses at 110 dg in the present experiment. In addition, greater variability of PPARGC1A expression in LW than in MS was also observed at 110 dg. Therefore, PPARGC1A could be a relevant upstream regulator involved in the accelerated muscle maturation observed in MS compared with LW in late gestation.
Cell culture experiments have shown that PPARGC1A is a coactivator of ESR1 (81). Interestingly, ESR1 activation has also been reported to increase the expression of PPARGC1A, NRF1, IRS1, and GLUT4 in skeletal muscle of ovariectomized mice fed a high fat diet (82). In our transcriptomic study, ESR1 expression increased dramatically between 90 and 110 dg and was greater in MS than LW fetuses at both stages (Fig.  12). In subnetwork 2 at 110 dg, ESR1 was found to be a regulator of several proteins mostly involved in protein folding (PDIA) and in cytoskeleton (SEPT2 and GSN). This estrogen receptor has been reported to be involved in estrogen-mediated regulation of substrate metabolism (83) and lower mRNA expression has been observed in the adipose tissue of obese compared with lean women (84). In ESR1 knockout mice, body weight was about 30% higher than in wild-type mice: mice were obese and their oxidative metabolism was impaired (85). In our study, under-expression of ESR1 in LW was accompanied by reduced oxidative metabolism but not by increased fatness (LW leaner than MS). Altogether, ESR1 increased between 90 and 110 dg, and its overexpression in MS compared with LW could be involved, in interaction with PPARGC1A, in the improved maturity of MS compared with LW animals at birth. CONCLUSION In this study, we developed an innovative strategy combining state-of-the-art statistical and computational methods to integrate multiomics data sets to gain a deeper insight into the late fetal maturation process. Three biological phenotypes of interest (i.e. adult fast and embryonic MyHC and muscle glycogen content) were identified as good descriptors of muscle maturity. Some proteins, involved in mitochondrial oxidative metabolism and with a possible transcriptional regulation were also emphasized as possible biomarkers of the maturation process. As one of the main sources of energy during late fetal life, muscle oxidative metabolism is very important at birth (56), and likely strongly influences muscle and whole body metabolic maturity at the time of birth. In pigs, a higher rate of death at birth has already been observed in genotypes with a lower percentage of oxidative fibers (15,57). In recent decades, the pig industry has been focusing on selection for rapid production of lean meat, low back-fat thickness and rapid growth rate, thus influencing muscle fiber properties (86). More precisely, a selection for lean tissue is more associated with muscles containing a low percentage of oxidative myofibers and a high percentage of large glycolytic myofibers (87). This kind of the selection could influence important genes, such as PPARGC1A and ESR1, with lower expression in skeletal muscle, which could contribute to an alteration or delay of mitochondrial gene/protein expression in late gestation and lead to muscle metabolic immaturity at birth.