Computational modeling identifies key gene regulatory interactions underlying phenobarbital-mediated tumor promotion

Gene regulatory interactions underlying the early stages of non-genotoxic carcinogenesis are poorly understood. Here, we have identified key candidate regulators of phenobarbital (PB)-mediated mouse liver tumorigenesis, a well-characterized model of non-genotoxic carcinogenesis, by applying a new computational modeling approach to a comprehensive collection of in vivo gene expression studies. We have combined our previously developed motif activity response analysis (MARA), which models gene expression patterns in terms of computationally predicted transcription factor binding sites with singular value decomposition (SVD) of the inferred motif activities, to disentangle the roles that different transcriptional regulators play in specific biological pathways of tumor promotion. Furthermore, transgenic mouse models enabled us to identify which of these regulatory activities was downstream of constitutive androstane receptor and β-catenin signaling, both crucial components of PB-mediated liver tumorigenesis. We propose novel roles for E2F and ZFP161 in PB-mediated hepatocyte proliferation and suggest that PB-mediated suppression of ESR1 activity contributes to the development of a tumor-prone environment. Our study shows that combining MARA with SVD allows for automated identification of independent transcription regulatory programs within a complex in vivo tissue environment and provides novel mechanistic insights into PB-mediated hepatocarcinogenesis.


Affymetrix GeneChip processing
The analysis of the micro-array data was done with the R statistical package version 2. 13 (2005) and Bioconductor libraries version 1.4.7 [4]. The four original data-sets containing Affymetrix CEL files were normalized independently using the Robust Multichip Average (RMA) implementation of the algorithm available in R/Bioconductor [4], producing four expression matrices, and the quality of the experiments was assessed using diverse statistics implemented in the package arrayQualityMetrics for R/Bioconductor [5].

Regulators associated with termination of developmental liver growth (⃗ v 1 )
To determine motifs underlying the four characteristic modes identified in this study, we selected motifs which contributed and correlated the most with each of the four singular vectors (Figure 3c,d,e,f ). In this way we obtained, for each of the 4 singular vectors, two clusters of motifs with similar activity profiles, i.e. one correlating negatively with the singular vector, and one correlating positively (Figures 3d,f ). We further refined the selection of the motifs associated with first singular vector as follows: 1) removing motifs for which the overall significance was lower than z < 1.5 and 2) removing motifs whose cognate TFs were not expressed in the liver (log-expression less than 6.0) log 2 e ≤ 6.0). This lead to the identification of 6 motifs motifs (Supplementary Table S1).
As originally observed in [1], completion of the post-natal liver development process occurs during the early PB-treatment time course, consisting in both hepatocyte proliferation at early stage, and progressive induction of liver-specific genes [6,7]. We here identify key regulators of these two processes: 1. we show that post-natal liver growth (that decreases over time) is regulated by known regulators of cell proliferation such as the E2F family of TFs [8,9,10], SRF [11] and Myc [12,13]; predicted target genes of these motifs have functions related to cell cycle and DNA replication (Supplementary Figure S6i), confirming the role of these regulators in cell proliferation. 2. We show that post-natal liver differentiation (which increases over time) is partly regulated by AHR, a known regulator of drug-metabolizing genes and transporters [14,15,16,17] that has been shown to play key role in liver development [18]. Thus, the main biological process associated positively with the first singular vector is cellular proliferation associated with post-natal liver growth for the first two weeks of the time course. Conversely, the targets of the motifs that are negatively associated with the first singular vector, i.e. corresponding to genes that increase their expression after the first two weeks, are enriched for functions associated with hepatocyte terminal differentiation, such as 'liver development', 'drug metabolism' and 'transcriptional regulation'.

Singular value decomposition analysis of the activity matrix of the CAR KO data-set
In order to identify and quantify the sources of motif activity changes in the CAR KO data-set, we performed Singular Value Decomposition (SVD) of the activities of the 189 motifs across the four conditions (PB-and vehicle treated livers from wild-type and CAR KO mice). Over 50% of the variance in the activity matrix was explained by the first two components of the SVD as evidenced by the spectrum of singular values ( Figure S6a).
In order to facilitate the biological interpretation of the singular vectors, we plotted the averaged activities of the right singular vectors v ks over each of the four sample groups and further identified regulatory motifs whose activity profiles correlate most strongly (either positively or negatively) with the activity profile of the singular vector. Visualization of the averaged activity of the first two singular vectors ⃗ v 1 and ⃗ v 2 in each of the four sample groups is shown in Figure S6b and scatter plots of the correlations ρ i and projections p i of all motifs i with the first and second right singular vectors are shown in Figure S6c.
The first right singular vector accounts for 33% of the variance and is characterized by a positive activity upon PB treatment in wild-type animals only. Given the absence of positive activity in CAR KO treated animals, we propose that this component represents the liver response to PB that is CAR-dependent. Moreover motifs which contribute and correlate most strongly with the first singular vector (TBP, NFE2, REST, GLI1,2,3, FOSL2, ELK1,2, and ZNF143) are all down-stream of CAR signaling under PB treatment (Table  S2) except CTCF, RXRG-dimer and STAT5{A,B}, further supporting the association of this component with the CAR-dependent liver response to PB treatment.
The second right singular vector accounts for 18% of the variance and is characterized by 1) a lower activity in wild-type liver samples compared to CAR KO samples, and 2) by an activity further lowered upon PB treatment in both wild-type and CAR KO samples ( Figure S6b). We propose that this component represents the basal liver activity down-stream of CAR that is further exacerbated upon PB treatment. However the motifs that contribute and correlate most strongly with the second singular vector do not coincide with any of the 5 motifs identified by differential motif activity analysis as down-stream of CAR signaling under physiological condition (Table S2). Furthermore the average activities have large associated error-bars for each sample group, indicating that the interpretation of this component must be considered with caution.
In conclusion, the SVD-based analysis of the activity matrix of the CAR KO data-set indicates that the major source of motif activity changes in these liver samples is the CAR-dependent liver response to PB treatment. This result is in line with the analysis based on differential motif activity. Importantly, prior biological knowledge indicates that at least two biological processes are occurring in this system, i.e the CAR KO effect and the xenobiotic response to PB treatment. Differential motif activity previously showed only a very minor CAR KO effect (only 5 motifs identified as down-stream of CAR signaling under physiological condition, see Table  S2) that may explain the absence of strong association of any component with this biological process.

Singular value decomposition analysis of the activity matrix of the tumor study data-set
In order to identify and quantify the sources of motif activity changes in the tumor data-set, we performed Singular Value Decomposition (SVD) of the activities of the 189 motifs across the four conditions (PB-and vehicle treated normal and tumorigenic liver samples). Over 57% of the variance in the activity matrix was explained by the first two components of the SVD that are the two significant components of the matrix, as evidenced by the spectrum of singular values ( Figure S7a).
In order to facilitate the biological interpretation of the singular vectors, we plotted the averaged activities of the right singular vectors v ks over each of the four sample groups and further identified regulatory motifs whose activity profiles correlate most strongly (either positively or negatively) with the activity profile of the singular vector. Visualization of the averaged activity of the first two singular vectors ⃗ v 1 and ⃗ v 2 in each of the four sample groups is shown in Figure S7b and scatter plots of the correlations ρ i and projections p i of all motifs i with the first and second right singular vectors are shown in Figure S7c.
The first right singular vector accounts for 32% of the variance ( Figure S7a) and is characterized by 1) a higher activity in PB-treated samples relative to non-treated samples, 2) an increased positive activity in promoted tumor samples relative to all other sample groups (normal treated and non-treated samples, and non-treated tumor samples) and 3) a slight decreased activity in non-promoted tumor samples relative to surrounding normal tissue ( Figure S7b). Moreover several motifs which contribute and correlate most strongly with the first singular vector (NFE2, E2F1-5, PBX1, and ESR1) as depicted in Figure S7c, have been identified as specific regulators of promoted tumors by differential motif activity analysis (see Table S4). These results indicate that motifs associated with this component are generally associated with a response to PB treatment which is further 1) exacerbated in promoted tumor samples and 2) inhibited in non-treated tumor samples, suggesting that the first component captures motifs associated with biological pathways underlying promoted tumors that are already up-regulated upon PB treatment and down-regulated in non-promoted tumors.
The second right singular vector accounts for 25% of the variance ( Figure S7a) and is characterized by an overall decreased activity in tumor samples relative to normal samples, irrespective of the PB treatment ( Figure  S7b); this suggests that the second component captures motifs associated with biological pathways underlying tumorigenesis. It is however noteworthy that none of the motifs which contribute and correlate most strongly with the second singular vector ( Figure S7c) were identified as regulators of tumorigenesis by differential motif activity analysis (Table S3). One explanation for this could be a strong variability in activity profiles leading to low Z-value of differential activity.
In conclusion the SVD-based analysis of the activity matrix allows for the identification of 1) regulators of promoted tumors (first component) which are consistent with those identified by differential motif activity analysis, and 2) regulators of liver tumorigenesis, which were not identified by differential motif activity analysis, potentially due to high noise to signal ratio.  Figure S1: Selection of representative biological terms and processes associated with the predicted target genes of motifs which activities were significantly (a) higher or (b) lower in promoted tumors relative to surrounding treated normal tissue, and in non-promoted tumors relative to surrounding non-treated normal tissue (Supplementary Table S3). Bars are colored according to motif to which the target genes are associated with. Bar height indicates significance of functional enrichment as it represents the −log10(P −Value) of functional enrichment in the given biological term or process as obtained from the DAVID Bioinformatic Resource (Database for Annotation, Visualization and Integrated Discovery) [19,20] Figure S2: Selection of representative biological terms and processes associated with the predicted target genes of motifs which activity was significantly (a) lower or (b) higher in promoted tumors relative to surrounding treated normal tissue, but that did not change in non-promoted tumors relative to surrounding non-treated normal tissue (Supplementary Table S4). Bars are colored according to motif to which the target genes are associated with. Bar height indicates significance of functional enrichment as it represents the −log10(P −Value) of functional enrichment in the given biological term or process as obtained from the DAVID Bioinformatic Resource (Database for Annotation, Visualization and Integrated Discovery) [19,20] Figure S4: Alpha fetoprotein (Afp) gene expression in liver samples from 13 week kinetic data-sets as a surrogate gene of post-natal liver development termination. Gene expression is given as mean ±SD (n=3-5 animals per group). Open bars = control. Black bars = phenobarbital-treated samples.

Abbreviations contained in Tables S1-S5
Tables S1-S5 contain motifs corresponding to specific groups that are 1. Table S1 -motifs associated with the first four singular vectors obtained from singular value decomposition (SVD) of the inferred motifs activity matrix from early kinetic study 2. Table S2 -motifs down-stream of CAR signaling 3. Table S3 -motifs dysregulated in both promoted and non-promoted tumors 4. Table S4 -motifs specifically dysregulated in promoted tumors 5. Table S5 -motifs down-stream of β-catenin signaling.
They are all formatted in the same way and their abbreviations are described in the following:    Table S1: Representative motifs of the first four singular vectors (explaining over 70% of the variance in the activity matrix) obtained from singular value decomposition of the inferred motifs activity matrix from early kinetic study, and underlying the early dysregulated biological pathways. Z-values of differential activity were computed as explained in Material and Method section of the main manuscript.     Table S3: Motifs which activities are significantly changing in promoted tumors relative to surrounding treated normal tissue, and in non-promoted tumors relative to surrounding non-treated normal tissue. These motifs are thus candidate regulators of liver tumorigenesis. Z-values of differential activity were computed as explained in Material and Method section of the main manuscript.   Table S4: Motifs which activities are significantly changing in promoted tumors relative to surrounding treated normal tissue, but not in non-promoted tumors relative to surrounding non-treated normal tissue. These motifs are thus candidate regulators of tumor promotion. Z-values of differential activity were computed as explained in Material and Method section of the main manuscript.