Self-consistent signal transduction analysis for modeling context-specific signaling cascades and perturbations

Biological signal transduction networks are central to information processing and regulation of gene expression across all domains of life. Dysregulation is known to cause a wide array of diseases, including cancers. Here I introduce self-consistent signal transduction analysis, which utilizes genome-scale -omics data (specifically transcriptomics and/or proteomics) in order to predict the flow of information through these networks in an individualized manner. I apply the method to the study of endocrine therapy in breast cancer patients, and show that drugs that inhibit estrogen receptor α elicit a wide array of antitumoral effects, and that their most clinically-impactful ones are through the modulation of proliferative signals that control the genes GREB1, HK1, AKT1, MAPK1, AKT2, and NQO1. This method offers researchers a valuable tool in understanding how and why dysregulation occurs, and how perturbations to the network (such as targeted therapies) effect the network itself, and ultimately patient outcomes.

are linearly related (see Section 4.1); and 3) leveraging transcriptomics and proteomics data from healthy tissues is a reasonable way to estimate those linear relationships, even among diseased samples.More subtle assumptions that are made include: 4) signaling activity is approximately conserved (meaning activation gets "passed on" from molecule to molecule through the network); 5) constitutive activation and spontaneous deactivation of signaling proteins and genes is biologically disfavored (Section 2.1.1 in the main text); 6) signaling networks tend to be efficient (that is, they tend toward a state of minimal total activity, see Section 2.1.1 in the main text and Supplementary Section 1.2); and 7) signaling networks are minimally impacted by perturbations (Section 2.1.2 in the main text).

Biases relating to parsimonious solutions in SCSTA
As noted in Section 2.1.1 of the main manuscript, the parsimony objective introduces certain biases in the solutions generated by SCSTA.In particular, it creates a tendency toward increased constitutive activation at or near the gene nodes.To see why, consider the trivial network motif shown in Supplementary Figure 8A.If we assume that the impinging activation of gene G is required to be 1, we immediately see that the minimal total activity flowing that can flow through this network (the parsimonious solution) occurs when e CA,G = 1 and all other edges take the value 0. Barring that, we also see that the next-most parsimonious solution occurs when e CA,C = 1, e C→G = 1, and all other edges are zero.Indeed, we can continue up the pathway like this, finding increasingly-less parsimonious solutions.This observation led to the introduction of the introduction of objective c 1 .This is not the only unintentional bias introduced by the parsimonious objective, however.Another important one is a general tendency toward decreased reliance on inhibitory interactions.Consider the network motif in Supplementary Figure 8B, and again assume we require the total activation impinging on G to be 1.In this case, barring constitutive activations, the parsimonious solution will simply involve activity flowing through the left-hand pathway (e.g. e A→B = e B→G = 1), with the right-hand pathway being unused.Indeed, for the right-hand pathway to be active requires higher levels of activity in the left-hand pathway, and larger total pathway activity in general.
The bias against such pathway activations manifests itself in a particularly striking way within the data presented in the main manuscript.For a gene to be predicted to be upregulated under inhibition of some upstream protein, the connecting pathway must involve at least one inhibitory interaction.As a result, the parsimonious objective's tendency to minimize use of such pathways leads to a decreased tendency to predict upregulation of genes, as seen in Figure 5 in the main manuscript.
In light of the overall performance of the model, and its apparent ability to prognosticate better and worse patient outcomes, the parsimonious objective used in this work still represents a reasonable starting point.Nevertheless, I believe that other objectives are worth investigating in future work.One might expect, for example, that there should exist evolutionary pressure against expression of signaling proteins that are not active, and that in turn most expressed proteins should be active at or near their maximal capacity; this might lead one to consider an objective in which total activity was maximized, perhaps using some weighting based on the proteins' degree of upregulation relative to healthy tissues.

Full self-consistency of the MPS
Here I suggest to the reader a few possible methods for ensuring full self-consistency in an MPS.
The simplest way to ensure full self-consistency may be an iterative approach, wherein an MPS solutions is sought, new N values are computed, and then they are used to update the relevant constraints, leading to yet another MPS solution (still being being minimally perturbed from the original baseline solution).This can be repeated until the changes in M values fall below some tolerance, and the solution is considered to be converged.
A second approach would be to reformulate the model such that the N values themselves are implicitly defined in terms of the M values.For example the constraint in Equation 1 of the main manuscript could be rewritten such that the right-hand side is less than or equal to the product of γ and the corresponding node's transcriptional activation (that is, the left hand side of Equation 3 in the main manuscript): This is problematic because it does not, in and of itself, account for the distribution of gene products to complexes.
Nevertheless, such an approach can be saved by also making this distribution implicit.For example, if gene product A performs interactions on its own, and is also a subunit of complexes B, C, etc., and they engage in other interactions, then if we require: and do the same for all other proteins in every complex, we effectively limit the amount of activation into each gene product and all of its associated complexes such that the total activation is bound by the expression level of the respective gene products.
1.4 Pretreatment behaviors tend to align with expected trends among ER-positive and ER-negative patients ER-positive patients have better prognoses than ER-negative patients.They tend to be less malignant and of lower grade, less proliferative, less apoptotic [1], show better differentiation (less stemmness) [2,3], and show less immune infiltrate [4].I investigated whether the TCGA patients showed the same types of trends prior to simulated ET.Patients were categorized as either ER-positive or -negative based on their reported immunohistochemistry (IHC) status, or, when unavailable or equivocal, based on gene expression data (see Methods Section 4.5 in the Main Manuscript).Welch't t-tests were then performed to determine whether each baseline behavior score differed between the two groups (see Supplementary Table 1).Significant differences were found in the cell death, cell cycle, immune, and stemness scores, each of which, except for the immune score, followed the expected trend (I note that the "Immune response," phenotype, and several others that comprise the immune score, showed the expected trend.
An insignificant difference was observed in the proliferation score, but it did follow the expected trend.Surprisingly, the metastasis score showed a significant difference, but it was in the opposite direction of that which was expected.
The immune score used in this work is comprised of 20 SIGNOR phenotypes, describing a range of different behaviors critical to immune system function.Prior to simulated ET, the ER-positive patients tended to have significantly larger immune scores than the ER-negative patients, in stark contrast to the general understanding within the field.As such, a more detailed analysis was carried out in which the values of each of SIGNOR's immunerelated phenotypes were computed and t-tests were performed to determine which phenotypes showed significant differences.Broadly speaking, the ER-positive patients tended to show: lower "Macrophage activation", lower "M1 polarization" and higher "M2 polarization", and higher "Macrophage differentiation"; lower "T-reg differentiation"; higher "T-lymphocyte differentiation" and "T-cell activation"; higher "B-lymphocyte differentiation" but lower "Bcell maturation"; higher "Basophil differentiation", and as noted above, lower "Immune response" (see Supplementary Table 2).
The metastasis score is much more problematic, and indeed I consider it essentially meaningless.It is composed of only a single SIGNOR phenotype, and not only does it not not follow the expected pretreatment trend, even the post-treatment score fails to align with reported M-stage (see Supplementary Figure 2).It is possible that the inclusion of multiple phenotypes in the other scores increases their robustness.

Epithelian-mesenchymal transition portends low sensitivity to endocrine therapy
Epithelial-mesenchymal transition is known to confer insensitivity to endocrine therapy.I found that among ERpositive patients, when ERα is inhibited, the stemness score, which includes genes associated with EMT, tends to increase.As shown in Supplementary Section 1.7 and Supplementary Table 7, the predominant mode through which this occurs is via decreased E-cadherin expression, itself a negative regulator of EMT.This means that as ER-signaling decreases, EMT is predicted to increase.Conversely, I considered the ER-positive patients with high vs. low pre-treatment EMT scores.I found that the mean change in proliferation under ET among patients with high pre-treatment EMT scores (that is, those with EMT scores greater than the median value over all patients) was −1.15 standard variates, while those with low EMT scores was −1.53 standard variates.A Welch's t-test gives a p-value of 0.0017, indicating that patients with high pre-treatment EMT scores tended to be significantly less responsive, in terms of proliferation, to ET than those with low scores.
In [5], it was shown that the expression of genes associated with several phenotypic traits were altered in an MCF-7-derived cell line with silenced estrogen receptor.In order to directly test the model's capacity to recapitulate these findings, I downloaded a dataset of RNAseq data for nearly 700 cell lines [6,7], and simulated ERα inhibition using the MCF-7.The model tended to predict relatively small numbers of differentially-regulated genes, and predicted only a single gene to be upregulated (see Supplementary Section 1.2 above).Although not specifically noted in [5], this upregulated gene was CXCL8, a known enhancer of EMT [8].Of the genes that were observed to be downregulated in [5], the model correctly identified ERBB4 (associated with endocrine resistance), PRLR (luminal and epithelial phenotypes), TFF1 (luminal and invasion/metastasis phenotypes), HOXA1 and HOXC6 (invasion/metastasis), KRT15 and KRT19 (epithelial phenotype), and CLDN4 (epithelial phenotype).For each of these genes, the model predicted complete loss of expression.Surprisingly, E-cadherin expression was predicted to be unchanged, despite it being the primary driver of the stemness score change among TCGA patients.While imperfect, these combined results point toward the loss of luminal and epithelial phenotypes, increased invasiveness and metastatic potential, and increased endocrine resistance in estrogen receptor silenced MCF-7 cells.
I next sought RNAseq data associated with ER-positive breast cancer cell lines with high expression levels of Snail, Slug, and Twist-all markers of EMT [9].I again turned to data from [6,7]

Correlations among covariates used in survival analysis
In order to determine which, if any, covariates in the survival analysis were correlated, I computed the Spearman correlation coefficient between each covariate pair.Only one particularly strong correlation (i.e.having a correlation coefficient, ρ S , with magnitude greater than 0. The MDP leading to the decrease in FOXA1's transcriptional activation by FOS involved only one additional player, JUN (see Supplementary Figure 7).FOXA1 acts as an activator of both SIGNOR's "Apoptosis" and "Cell death" phenotypes.Under ET, the model predicts that the loss of ERα's activation of JUN is partially offset by the redirection of FOS's activation away from FOXA1 transcriptional activation, leading to a decrease in FOXA1 expression, and in turn a decrease in our cell death score.
The MDP leading to the decrease in HIP1's transcriptional activation by TFAP2A involved two additional steps (see Supplementary Figure 7).Like FOXA1, HIP1 acts as an activator of both SIGNOR's "Apoptosis" and "Cell death" phenotypes.Under ET, the model predicts a decrease in ERα's activation of GNA13, which is partially offset by increased activation of GNA13 by F2R (the first two steps in the pathway leading to decreased HK1 expression).
Then, in addition to the aforementioned redirection of F2R's activity away from EGFR (Figure 4 in the main manuscript), this increased F2R activity is also partially supported by redirection of TFAP2A's activity away from transcriptional activation of HIP1.This leads to decreased HIP1 expression, and a decrease in the cell death score.
Finally, the MDP leading to the decrease in CASP3's transcriptional activation by TP53 involves one additional player, COL18A1 (see Supplementary Figure 7).CASP3, like FOXA1 and HIP1, acts as an activator of both SIGNOR's "Apoptosis" and "Cell death" phenotypes, among several others.Under ET, the model predicts a decrease in ERα's activation of COL18A1, which is partially offset by redirection of TP53's activity away from CASP3 transcriptional activation.
Leveraging the data from [11], FOXA1, HIP1, and CASP3 all show evidence of downregulation under ET, although only FOXA1 is significantly so (with 1-sided t-test p-values of 0.0011, 0.15 and 0.19, respectively).
Cell Cycle.The φ-values for the cell cyling score indicated that approximately 75% of the observed score change under ET can be attributed to just three transcriptional activations.Each is associated with the direct activation of a target gene by ERα, although their impact on the score varies.The most important (φ ≈ 0.34) corresponds to the transcriptional activation of E2F1, which is an activator of SIGNOR's "G1/S transition" phenotype, and in turn, an activator of our cell cycle score.The second most important (φ ≈ 0.26) corresponds to the transcriptional activation of CDKN1A, which is included as an inhibitor of SIGNOR's "Cell cycle progress" phenotype, as well as an activator of its "Cell cycle block" and "Cell cycle exit" phenotypes, and thus a strong inhibitor of our cell cycle score.The third most important (φ ≈ 0.15) corresponds to the transcriptional activation of CCND3, which is included as an activator of SIGNOR's "Cell cycle exit" phenotype, and thus an inhibitor of our cell cycle score.All told, the loss of ERα activity under ET is predicted to lead to a decrease in expression of each of these genes, giving rise to a mixed effect wherein the decrease in E2F1 decreases cell cycle activity, while the decreases in CDKN1A and CCND3 lead to increases.
As before, I leveraged the data from [11], and found that E2F1, CDKN1A, and CCND3 all show evidence of downregulation under ET, but that only E2F1 is significantly downregulated (with 1-sided t-test p-values of 6.4×10 −5 , 0.30 and 0.31, respectively).
Immune.The observed change in the immune activity score under ET was almost solely associated with a predicted decrease in transcriptional repression of IL-8 by ERα (φ ≈ 0.94).IL-8 is an activator of SIGNOR's "Inflammation" phenotype, which is, in turn, an activator of the immune activity score.IL-8 expression is known to negatively correlate with ERα expression, be regulated by both ERα and FOXA1 among others, and has been associated with increased invasiveness, proliferation, metastatic potential, and endocrine resistance [12][13][14].Nevertheless, no additional evidence could be found to directly support the predicted increase in IL-8 expression or associated inflammation within the breast TME under ET, and in fact, the data in [11] indicates an (insignificant) decrease in IL-8 expression (p ≈ 0.23).
Stemness.The vast majority of the change in stemness score under ET was associated with two transcriptional activations, namely those of CDH1 by ERα (φ ≈ 0.52), and NANOG by a complex comprised of RARA and RXRA (φ ≈ 0.42).CDH1 is an inhibitor of SIGNOR's "Epithelial-mesenchymal transition" phenotype, and in turn an inhibitor of our stemness score, while NANOG is an activator of SIGNOR's "Pluripotency" phenotype, and thus an activator of our score.
The loss of ERα activity under ET is predicted to decrease CHD1 expression, leading to an increase in stemness.
The predicted regulation of NANOG is more subtle (see Supplementary Figure 7).The MDS that proceeds from NANOG involves: the loss of activation of FOX1A by ERα; the subsequent loss of activation of FOX1A's downstream target Vitronectin; reallocation of the activity of Thrombospondin-1 away from its inhibition of the RARA:RXRA complex toward activation of Vitronectin; ultimately leading to enhanced transcriptional activation of NANOG and the predicted increase in the stemness score.
I again leveraged the data from [11], and found that the gene CDH1 was significantly downregulation under ET (p ≈ 0.016), and that, although not significant, NANOG showed a trend toward upregulation (a one-sided t-test for upregulation yielded a p-value of 0.25).
Metastasis.Despite deep reservations about the value of the metastatic potential score (see Supplementary Section (φ ≈ 0.89; see Supplementary Figure 7).MET is an activator of SIGNOR's "Metastasis" phenotype, and is thus an activator of our metastatic potential score; under ET, the model predicts that activation of FOX1A by ERα will decrease, and in turn, inhibition of the MET gene will also decrease, leading to enhanced transcription, and increased metastatic potential.The data from [11] indeed shows significant upregulation of MET under ET (p ≈ 0.0097).
I note that MET has been associated with basal-like breast cancers, and has been considered as a therapeutic target in that context [15].It was also noted in Section 2.2 of the main manuscript that the increase in the metastasis and immune scores may be related to ET causing a shift toward a more basal-like state.It is possible that part of this shift may be through enhanced MET expression.This, of course, raises the possibility that some ER-positive patients may benefit from dual-blockade of ERα and MET.

Proposed microenvironmental SCSTA
The bulk RNAseq samples used in this manuscript represent mixtures of various cell types.In its current form, the model can not account for the types of intercellular signaling that may be present (e.g.immune evasion via PD-1/PD-L1, etc.).Nevertheless, one exciting avenue presents itself.One could imagine employing a strategy in which the bulk RNAseq is first deconvolved and purified in order to reveal cell-type specific mixing fractions and associated expression profiles.Methods like CIBERSORTx, among others [16], could be brought to bear in this regard.Then, a compartmentalized SCSTA model could be constructed, such that each cell type represents a different compartment within a shared microenvironment (see Supplementary Figure 10).Each compartment could be thought of as its own sub-model, with its own gene expression-based sets of constraints.Importantly, however, the compartments can be coupled through signaling interactions among secreted and/or cell surface proteins.Specifically, intracellular signaling proteins within each cell type would be capable of interacting with other proteins of the same cell type, but the secreted or surface-bound proteins would be able to engage in interactions with the surface-bound proteins of any cell-type in the full model.Solution of this compartmentalized model would proceed in much the same way as outlined in this manuscript.This would enable the model to predict the effects of inter-cellular signals, and simultaneously serve to mitigate errors or uncertainty that may otherwise arise (e.g. if the model predicts pathways activity through proteins known to be expressed by some cell types present in the bulk RNA sample, but not in the tissue of interest).
1.9 Minimal Data Requirements for SCSTA Several minimal data requirements exist for the use of the methods presented here.SCSTA requires some sort of directed graph representing the signal transduction network; as noted in the main manuscript, this could come from either an existing knowledgebase (like OmniPath or some similar repository), or be constructed through some sort of network reconstruction algorithm.SCSTA, as envisioned here, also requires gene expression data, including at a minimum, RNAseq data, although paired transcriptomics and proteomics could be leveraged.If proteomics is not available, some sort of mapping between transcript expression and protein expression is required; here I used estimates for protein-to-transcript ratios extracted from [17].Semi-quantitative transcriptomics or proteomics data, such as microarray data, or multiplexed immunofluorescence would require some sort of data normalization scheme in order to map measured values directly to expression level.Finally, I note that significant variability exists in gene expression and in patient outcomes; in order to make meaningful claims connecting predicted pathway activities or perturbations to patient survival, researchers will, in general, require relatively large numbers of patients' transcriptomes.
p represents the parent edges of node n, and e g p represents the parents edges of the corresponding gene.
, and subsetted it to only the breast cancer lines.I then computed the median TPM for ESR1 expression, finding a value of 2. I next consulted the literature to determine which ER-positive cell lines are in common use.I found that MCF-7, T47D, and ZR-75-1 are the most common[10], and of them, the lowest ESR1 expression level was that of ZR-75-1, taking a value of 4 TPM in the transcriptomics dataset-still larger than the median value.I took this value of 4 TPM to be my cutoff for ESR1 positivity, and then sought the cell lines with the largest expression levels of Snail, Slug, and Twist.The highest Slug and Twist expression levels occurred in the cell line JIMT-1 (with a Slug expression of 130 TPM, Twist expression of 35 TPM, and ESR1 expression level of 7 TPM), while the highest Snail expression level occurred in BT-483 (with Snail expression of 20, and ESR1 expression of 29).I then simulated the impact of ET in each cell line, finding that neither cell line's proliferation score changed appreciably.JIMT-1's proliferation score decreased by about 8.1 × 10 −3 %, while BT-483's proliferation score decreased by about 0.6%.I note that JIMT-1 is not generally considered ER-positive, despite its ESR1 expression level being higher than that of ZR-75-1 in this dataset.While by no means conclusive, this indicates that (arguably) ER-positive cell lines with high expression levels of EMT markers should indeed be insensitive to ET.

5 )
was observed.This was observed between the predicted change in stemness score and ESR1 expression level, with a correlation coefficient of 0.61.This indicates that tumors with high ESR1 over-expression are predicted to show larger increases in stem-like behaviors during ET.Beyond that, other correlations tended to be small (|ρ S | ≤ 0.2 in magnitude) or modest (0.2 < |ρ S | ≤ 0.5).As noted in Section 2.2.1 of the main manuscript, the proliferation score change showed only modest Spearman correlation coefficients with ESR1 expression, and the change in stemness score.The change in cell death score followed a similar trend, again showing only Spearman correlations between it and both ESR1 expression and the change in stemness score (ρ S values of 0.26 and 0.34, respectively).The change in cell cycling behaviors showed only small spearman correlations.The change in immune score, like proliferation and death, was modestly correlated with the ESR1 expression and the change in stemness (ρ S values of 0.23 and 0.25, respectively).In addition to those noted above, the change in stemness score was also modestly correlated with patient age at diagnosis (ρ S = 0.27).The change in metastasis score only showed a single modest correlation with ESR1 expression (ρ S = 0.26).Finally, among the clinical data covariates (T-, N-, and M-stages, ER, PR, and Her2 statuses, and age at diagnosis), modest correlations were observed between T-stage and N-stage (ρ S = 0.28), and between age at diagnosis and ESR1 expression.1.7Pathway Analysis of other behaviorsCell Death.Computing the φ values for the cell death behavior yielded a diverse mixture of both pro-and anti-cell death activations.The three largest φ values that tend to lead to increased cell death are closely related to those that also led to decreased proliferation described in the main manuscript, namely the decreased activations of the genes AKT1, AKT2, and HK1 (with φ values of approximately 0.12, 0.13, and 0.077, respectively).The three largest φ values that led to decreased cell death included decreases in the transcriptional activations of: FOXA1 by FOS (φ ≈ 0.087); HIP1 by TFAP2A (φ ≈ 0.082); and CASP3 by TP53 (φ ≈ 0.057).

1. 4
above), I nevertheless analyzed the drivers of its predicted change under ET.The dominant cause for the predicted change in the metastatic potential score was a predicted decrease in transcriptional inhibition of MET by FOX1A

Supplementary Figure 2 :
Boxplots of post-ET metastatic potential scores, stratified by M-stage.No clear difference is evident.A Welch's t-test was performed, resulting in a p-value of 0.60.

Supplementary Figure 3 :
Histograms of predicted changes in behavior scores.For each of the six behavior scores, the predicted changes (post-ET minus pre-ET) among ER-positive patients is shown.

Supplementary Figure 4 :
2D histograms showing the bivariate distributions of each pair of predicted score changes and clinical features used in Section 2.2.1 of the main manuscript.In the vast majority of covariate pairs, no obvious trend is observed.In certain select pairs, modest correlations are found, and in one pairnamely the change in stemness score and ESR1 expression level-a large correlation is observed (see Supplementary Section 1.6).

Supplementary Figure 5 :
Scatter plot of predicted change in proliferation score vs. pre-treatment score.Each point represents an ER-positive patient with x-value indicating the pre-treatment score, and y-value indicating the predicted change in score.

Supplementary Figure 6 : 7 :
Kaplan-Meier plot of patient survival grouped based on both pretreatment proliferation score, and predicted change in score under ET (both binarized at their respective median values).Note that the change in proliferation score is defined as the difference between predicted post-and pretreament score; because almost all patients' scores decrease, higher values of the change represent smaller such decreases, while lower values indicate larger decreases.The blue curve (labeled +/+) indicates patients with a high score change (small decrease) and high pre-treatment score.They show the worst overall survival.The orange (+/-) and green (-/+) curves indicate patients with high (low) score changes and low (high) pretreatment values.They tend to show intermediate survival.The red curve (-/-) indicates patients with low change (large decrease) and low pretreatment proliferation score.They show the best survival.Survival is shown in days.Width of colored regions surrounding each line indicates 95% confidence interval.Cartoon indicating how ET is predicted to affect several genes associated with the behavior scores.For each MDP, the affected gene is shown in green on the left, and the pathway proceeds to the right, terminating at ERα.In all interactions, parent nodes are depicted above child nodes.The pathways regulating FOXA1, HIP1, and CASP3 (top three) are all related to the cell death score, while those regulating NANOG and MET (bottom two) are tightly coupled, and relate to stemness and metastasis, respectively.