Transcriptional landscape of oncogene-induced senescence: a machine learning-based meta-analytic approach

Oncogene-induced senescence (OIS) is highly heterogeneous, varying by oncogenic signals and cellular context. While its dual role, in the initial inhibition potentially later leading to promotion of tumors through the senescence-associated secretory phenotype, is still a matter of debate, it is undeniable that OIS is critical to understanding tumorigenesis. A major obstacle to OIS research is the absence of a universally accepted marker. Here, we present a robust OIS-specific transcriptomic secretory phenotype, termed oncogene-induced senescence secretory phenotype (OIS-SP), which can identify OIS across multiple biological contexts from in vitro datasets to in vivo human samples. We apply a meta-analytic machine learning pipeline to harmonize a deliberately varied selection of Ras-Raf-MEK-induced senescence datasets of differing origins, oncogenic signals and cell types. Finally we make use of bypass data to identify key genes and eliminate genes associated with quiescence, so identifying 40 OIS-SP genes. Within this set, we determined a robust core of five OIS-SP genes ( FBLN1 , CXCL12 , EREG , CST1 and MMP10 ). Importantly, these 5 OIS-SP genes showed clear, consistent regulation patterns across various human Ras-Raf-MEK-mutated tumor tissues, which suggests that OIS-SP may be a novel cancer driver phenotype with an unexpectedly critical role in tumorigenesis.


Introduction
Oncogene-induced senescence (OIS) refers to irreversible premature cell-cycle arrest facilitated by oncogenically driven hyperproliferation. In the absence of additional mutational hits or other dysfunctional regulatory mechanisms, OIS is believed to halt abnormal growth in the early stages of tumorigenesis and to maintain affected cells in a benign, premalignant state (Braig and Schmitt, 2006;Braig et al., 2005). The first seminal paper on OIS described cell-cycle arrest and senescent phenotype driven by the overexpression of H-Ras (Serrano et al., 1997). While more than 50 oncogenes and tumor suppressive genes have been reported to induce senescence since then (Gorgoulis and Halazonetis, 2010), Ras and its downstream effectors Raf and MEK remain the most well-studied and well-established in vitro (Lin et al., 1998;Michaloglou et al., 2005;Narita et al., 2003;Wei et al., 1999) and in vivo (Braig et al., 2005;Collado et al., 2005;Dankort et al., 2007;Dhomen et al., 2009;Sarkisian et al., 2007) models of OIS. Ras and Raf have also been reported to be few of the most frequently mutated oncogenes in cancers (Mendiratta et al., 2021). Indeed, senescent cells also possess much tumorigenic potential (Lee and Schmitt, 2019). Alterations to the surrounding milieu through the senescence associated secretory phenotype (SASP) is a key characteristic that is essential in reinforcing senescence (Ohtani, 2022) and when prolonged, can contribute to cancer development (Cuollo et al., 2020;Park et al., 2021). SASP consists of inflammatory, growth-inducing and remodeling factors shown to be involved in tumor suppression, embryonic development and wound healing (Kuilman et al., 2008;Muñoz-Espín et al., 2013), but also promotes malignancy in nearby cells (Gonzalez-Meljem et al., 2018). Accumulating evidence reports that SASP may lead to malignant transformation through protumorigenic activities (Krtolica et al., 2001;Sparmann and Bar-Sagi, 2004). Thus, OIS occupies a complex position at the crossroads both towards and away from cancer.
There are no established criteria by which OIS can be conclusively identified, and the senescent phenotype itself is not clearly defined: senescence is commonly identified by markers such as proliferative arrest, senescence-associated beta-galactosidase (SA-β-gal) activity, activated Rb/p16 and p53/p21 pathways and the induction of the senescence-associated secretory phenotype (SASP) (Gorgoulis et al., 2019). However, these markers are not unique to the senescent phenotype, as they are found in other non-cycling cells, such as quiescent cells (Dulic, 2011;Kippin et al., 2005;Yang and Hu, 2005), or in response to other stresses, such as inactivation of Rb (Shapiro et al., 1995). Moreover, multiple oncogenic mutations do not always lead to senescence; under certain circumstances, differential regulation of epigenetic factors (Ito et al., 2018) or altered metabolite production  can lead to OIS bypass through inhibition or reversal of senescence. Compounding these difficulties, senescence is not a static event. Rather, its heterogeneity, including its transcriptome, is subject to changes over time.
Our goal was the identification of an OIS-specific transcriptional phenotype that is robust across heterogenous in vitro datasets and human in vivo samples through a meta-analytic machine learning approach. Thus it was important that we sampled the full heterogeneity of OIS by considering OIS datasets of varying conditions and origins to increase the chance that our model is independent of cell type, oncogenic signaling and other factors related to experimental and environmental conditions. To help ensure that the signals identified were unique to OIS, we also considered features that emerge individually in time-series, bypass and quiescence datasets, in order to identify genes that continuously increase or decrease expression with time and to rule out features shared with bypass and quiescence. The transcriptional patterns in the OIS secretome were analyzed by a novel state-of-the-art meta-analytic machine learning pipeline.

Results
We curated a total of 10 heterogenous OIS datasets from 9 cohorts, varying by oncogenic signaling, cell line origin, and microarray platform ( Table 1). The oncogenes include Ras, B-Raf and MEK; the cell lines include melanocytes and fibroblasts (WI-38 and IMR90). Two studies (GEO accessions GSE59522 and GSE144397) provide time series data.
One study, GSE2487, provides data on fibroblasts that bypassed OIS with papillomavirus oncoproteins E6 and E7 (E6/E7) as well as on MEKinduced senescent fibroblasts. Additionally, we collected 3 heterogenous quiescent datasets from 2 cohorts that vary by cell-line origin and culture condition. GSE144397 provides time series data of quiescent WI38 cells. GSE19864 provides two datasets of quiescent IMR90 cells. Refer to Table 1 for more details on the datasets. Abbreviations: OIS, oncogene-induced senescence; DMEM, Dulbecco's Modified Eagle Medium; FBS, fetal bovine serum; PV, papillomavirus.
Through a preliminary analysis of OIS datasets by RankProduct (RP), we identified 1287 genes that emerged exclusively in meta-analysis and excluded 441 genes identified exclusively in individual studies, yielding a total of 3370 genes (Fig. 1A). Through an intersection with 4 other meta-analytic methods (see Methods section), the number of differentially expressed genes (DEGs) was reduced to 3026, a result that was significant with false discovery rate adjusted p-value < 0.05. Table S1 shows significantly upregulated and downregulated genes produced by the 5-way meta-analysis described above. The majority of the top DEGs (Table S2) encodes cytokines, matrix remodeling factors and regulators that make up senescence associated secretory phenotypes (SASP), suggesting that SASP may be viewed as a defining phenotype of OIS.
Given the results of the above preliminary analysis, we decided to build the OIS transcriptomic profile starting from a list of human secretory genes (Fig. 1B). We referenced the Human Protein Atlas for a list of 1903 human genes that are predicted to secrete protein products (Uhlén et al., 2019) to initially screen for relevant genes in the time-series and bypass datasets. When we applied fuzzy clustering, an unsupervised learning method, to these secreted genes within three time-series studies from GSE144397 and GSE19864, we observed both significant overall increase and decrease in expression of groups of genes  The diagram to the right shows the number of genes that overlap among 5 meta analytic methods: Fisher, RP, Random Effects Model (REM), Fixed Effects Model (FEM) and Stouffer. B) A Venn diagram showing the intersection among the meta analysis genes from the 10 OIS datasets, fuzzy clustering genes from the 3 OIS time-series datasets and 1 OIS bypass dataset, and excluding the genes found in the meta analysis of quiescence cells. A list of the 40 OIS-SP genes in green cross-hatched subsets is available as Table S3. C) Fuzzy clustering of secretory genes applied to the 3 OIS time-series datasets: the 3 graphs to the left show clusters of genes that progressively decrease expression and the 3 graphs to the right show those that progressively increase expression with time. Fuzzy clustering applied to 1 OIS bypass dataset: the 6 graphs in the center show clusters of genes that are differentially regulated in proliferating cells (IMR90 hTERT), OIS cells (IMR90 hTERT MEK:ER) and OIS bypass cells (IMR90 hTERT MEK:ER E6/E7). For analysis, genes from modules 1 and 4 were selected as their expression clearly changed under OIS and remained stable under other conditions. OIS, oncogene-induced senescence; RP, RankProduct; REM, Random Effects Model; FEM, Fixed Effects Model; MAG, meta-analysis genes; FC, fuzzy clustering; OIS-SP, oncogene-induced senescence secretory phenotype.
during the duration of induction in all three studies, which allowed us to identify distinct modules (Fig. 1C). Furthermore, we applied fuzzy clustering to cells in three different states provided by GSE 2487: proliferating, OIS and OIS bypass.We identified genes that are differentially regulated in OIS, OIS bypass and proliferation and selected ones exclusive to OIS (Fig. 1C). As the final step, we extracted the genes that overlap between the 5-way meta-analysis and fuzzy clustering and excluded the genes associated with quiescence (Fig. 1B). This process produced a list of 40 upregulated or downregulated genes, which we define as the oncogene-induced senescence secretory phenotype (OIS-SP) ( Table S3). The regulation pattern of OIS-SP genes was consistent across all datasets, as was the upregulation of p16, a widely used senescence biomarker (Fig. 2). For the purpose of dimension reduction and robust cross validation, we applied penalized regression, a supervised learning method, and leave-one-study-out cross validation (LOSOCV) to OIS-SP. In the cross validation, we utilized 7 homogenous cohorts of Ras-induced fibroblasts to train the model and 3 heterogenous cohorts of varying oncogenic signaling and cell types for a rigorous external blind validation (Fig. 3A). Using the Efficient Parameter Selection via Global Optimization algorithm (Frohlich and Zell, 2005), we determined the regularization parameters and the core feature number of 5 ( Fig. 3B and C). This successfully yielded 5 core features with non-zero coefficients, which are negative for fibulin 1 (FBLN1) and C-X-C motif chemokine ligand 12 (CXCL12) and positive for epiregulin (EREG), cystatin SN (CST1) and matrix metallopeptidase 10 (MMP10) (Fig. 3D, Table S4). The 5 OIS-SP genes discriminate OIS with perfect AUROC and AUPRC (Fig. 3E).
We further investigated whether OIS-SP appears in different types of human cancer tissues carrying Ras/Raf/MEK mutations. From The Cancer Genome Atlas (TCGA) patient tumor samples, we selected those containing high Ras/Raf/MEK mutational burden and more than 5 pairs of normal adjacent to tumor (NAT) and primary tumor (PT) tissues to reduce the risk of confounding bias. Lung adenocarcinoma (LUAD) and thyroid carcinoma (THCA) samples satisfied these requirements. For the most part, we observed that the pattern of regulation of the 5 OIS-SP genes is consistent between the in vitro OIS models and the TCGA Fig. 2. Consensual CXCL12 and CDKN2A (p16) transcription patterns observed in OIS across 10 datasets. On top, the box plots show an upregulation of CDKN2A, a gene that codes for p16, in OIS cells relative to non-senescent cells in each of the 10 datasets. The increasing trend is also observed in the line graph to the right, which plots time-series data provided by 3 OIS datasets. On the bottom, box plots show a downregulation of CXCL12, an OIS-SP gene, in OIS cells relative to nonsenescent cells in each of the 10 datasets. The decreasing trend is also observed in the line graph to the right, which plots time-series data provided by the 3 OIS datasets. OIS, oncogene-induced senescence; OIS-SP, oncogene-induced secretory phenotype; CDKN2A, cyclin dependent kinase inhibitor 2A; CXCL12, C-X-C motif chemokine ligand 12.
tumor patient tissues: a significant downregulation of FBLN1 and CXCL12 and upregulation of EREG, CST1 and MMP10 (Fig. 4), though in the case of lung adenocarcinoma (LUAD), EREG was not significantly upregulated This indicates that OIS-SP is not a phenomenon restricted to cell lines and that studying the role of OIS in the tumor microenvironment is worthy of further study.

Discussion
Our study defines a secretory transcriptomic signature evident in 10 experimental oncogene-induced senescence (OIS) datasets, which vary by the origin, type of oncogene and cell type. In developing our OIS signature, we addressed features expressed in states that also exhibit senescent characteristics yet are distinctly different from OIS, such as quiescence and bypass, in order to build an OIS-specific profile. Given that senescence is progressive, we considered features that increase or decrease expression with time as assessed by time-series datasets. Our state-of-the-art computational pipeline utilizes meta-analytic methods including Fisher, RankProduct, Random Effects Model, Fixed Effects Model and Stouffer and machine learning techniques such as fuzzy clustering, which is well-suited for handling biological data (Dembélé and Kastner, 2003), and penalized regression. As a result, we identified 40 oncogene-induced secretory phenotype (OIS-SP) genes and 5 core OIS-SP genes that are consensually regulated across all 10 datasets (Fig. 5).
Our results contribute to the ongoing conflicts among previously  reported findings. For one, upregulation of C-X-C motif chemokine ligand 12 (CXCL12), an inflammatory chemokine, was reported in aged prostate fibroblasts (Begley et al., 2005) and in senescent tumor cells from B-RafV600E-expressing papillary thyroid carcinoma (Kim et al., 2017) although its downregulation was observed in replicative and UVA-induced senescent fibroblasts (Yoon et al., 2018). In our analysis, CXCL12 was downregulated in Ras/Raf/MEK-induced senescent cells and in analogously mutated patient tumors. While CXCL12 is known to be a strong chemotactic for lymphocytes  and promotes angiogenesis and tumor growth in breast cancers through carcinoma-associated fibroblasts (Orimo et al., 2005), it is interesting to note that it has been reported to participate in tumor immunosuppression by altering the distribution of dendritic-cells (DCs), where myeloid DCs are inhibited and plasmacytoid DCs are recruited in human ovarian epithelial tumor cells (Zou et al., 2001). Knockdown of cystatin SN (CST1), a cysteine protease inhibitor, was reported to induce senescence in SW480 colon cancer and MDA-MB-231 breast cancer cells (Oh et al., 2017). Meanwhile, upregulation of CST1 was reported in both replication-and Ras-induced senescent fibroblasts (Keppler et al., 2011). Likewise, our study shows that CST1 was upregulated in Ras/Raf/MEK-induced senescent cells and in patient tumors. The discrepancies can be attributed to the differences in cell type, stimuli to senescence induction, progression of senescence and experimental choices of each study, which emphasize the importance of meta-analysis in demonstrating the pooled effect across heterogenous data. Fibulin 1 (FBLN1) is a glycoprotein that is incorporated into the extracellular matrix (ECM) with a role in stiffness and fibrosis (Liu et al., 2019;Yasmin et al., 2018), with varying expressions in different cancer types (Gong et al., 2020;Xiao et al., 2014). Epiregulin (EREG) and matrix metallopeptidase (MMP10) are canonical SASP biomarkers reported to be upregulated in senescence (Coppé et al., 2010), with several reports of contribution to cancer development. EREG is upregulated in cancer-associated fibroblasts (CAFs), involved in migration and invasion (Neufert et al., 2013;Wang et al., 2019;Yoshikawa et al., 2013). Likewise, MMP10 may play a role in angiogenesis, invasion and metastasis in various tumors (Deraz et al., 2011;Zhang et al., 2014). It has also been reported that MMP10 shifts the macrophage activation from proinflammatory M1 cells to anti-inflammatory M2 cells, possibly contributing to a immunosuppressive environment (McMahan et al., 2016).
We note that there is evidence of shared physical interactions between the protein members of this core set and other proteins: EREG and CXCL12 share a common interaction partner CXCL9 (Lyne et al., 2022). Similarly CXCL12 and FBLN1 share FN1 as a common interaction partner. This hints that these three members of the core OIS-SP set may be closely involved in the same process. Consistent with this, we note that FN1 was identified as upregulated in the five-way meta-analysis. We note also that FBLN1 and MMP10 share pathway annotation for involvement in extracellular matrix organization, while CXCL12 and EREG share pathway annotations including for ERBB4 signaling and ESR-mediated signaling (Lyne et al., 2022). Thus further studies on the individual or combined role of the proteins encoded by these genes could help elucidate the contribution of OIS to tumorigenesis. Possibly, this could include perturbation studies to recreate the core OIS-SP expression profile in cells or mouse models. Although our study focuses on Ras/Raf/MEK-induced senescence and -mutated cancers, it would be beneficial to examine a wider range of cancers for the prevalence of OIS-SP outside of Ras/Raf/MEK cancers.
While the core 5 OIS-SP genes have been studied for their roles in senescence or cancer individually, little is known about their role linking OIS and cancer. We explored the in vivo patterns of OIS-SP genes through an examination of their expression in TCGA patient tumor samples carrying the same oncogenic mutations. Interestingly, we observed that the 5 core OIS-SP genes consistently show a unified pattern of regulation across tumor samples. Previously, it was suggested that senescence serves as a potent mechanism against tumorigenesis, deterring progression into malignancy (Braig and Schmitt, 2006;Braig et al., 2005). While there have been reports challenging this notion and implicating senescence in tumorigenesis, they have relied on markers such as SA-β-Gal and p16 (Lee et al., 2002;Stein et al., 1999) or focused on a particular trait of senescence, such as SASP (Eggert et al., 2016;Laberge et al., 2015;Malaquin et al., 2013). Our study provides robust transcriptomic evidence to the unexpected proposition that OIS phenotype may be dominant in tumor cells . While it may be reasonable to hypothesize that the heterogeneous composition of TCGA tumor tissues mainly consist of uncontrollably proliferative tumorigenic cells, quite paradoxically, our findings suggest otherwise and show that the OIS phenotype is markedly strong across human tumor tissues. Potentially, OIS could be a novel driver phenotype in cancers, explained by the behavior of OIS-SP, or a new type of cell that possesses senescent phenotype and could play a prominent role in tumorigenesis. The impact of the OIS on tumor progression is ill-defined, and future studies elucidating the role of the OIS phenotype in tumors could provide novel strategies targeting tumors.
Over the past decade, it has become clear that SASP is a central hallmark of senescence. Following the widely accepted geroscience hypothesis, which posits that targeting senescence may alleviate chronic age-related diseases (Sierra and Kohanski, 2017), SASP has been established as one of the main targets in developing senotherapeutic strategies. Recently, the role of SASP in tumor microenvironment has garnered much attention, and though it has been extensively studied, an exact characterization of SASP is further complicated by various factors, including its seemingly contradictory role in the immune milieu, where it has been reported to recruit immune-suppressive myeloid cells (Di Mitri et al., 2014;Eggert et al., 2016) but also contribute to chronic inflammation (Franceschi and Campisi, 2014). Therefore, while it is certain that SASP is universal across senescence, it is important to acknowledge that SASP is highly variable by the type of senescence-inducing stimuli and that it is necessary to study SASP in an inducer-specific manner. Our transcriptomic meta-analysis revealed that the majority of the top ranked differentially expressed genes consisted of secretome, demonstrating that oncogene-induced specific secretory phenotype may be viewed as a hallmark phenotype of OIS that may mediate its pathophysiological effects. Moreover, we show that this OIS-specific phenotype is clearly expressed across various stages of cancer and in the view that 'senescence escape' of cancer cells leads to tumorigenesis, our study may provide important clues to explaining the cell fate that is determined at the crossroad between senescence and cancer.
In closing, our study demonstrates how OIS-SP, which is derived from a computational pipeline, not only detects OIS in experimental datasets but also in paired patient samples with oncogenic mutations across various tumor stages. OIS-SP genes can be utilized as a specific senescence biomarker that overcomes the limitations of widely used senescence biomarkers, such as p16 and senescence-associated beta galactosidase, that are also observed in nonsenescent states (Sharpless and Sherr, 2015). The pipeline for OIS-SP discovery can be considered to be relatively unbiased, as it does not explicitly presume distinct characteristics of OIS or rely on previously known senescence markers, besides the bias inherent in the input datasets. In light of the prevalence of OIS in the tumor microenvironment as highlighted by our study, we anticipate that the core OIS-SP genes will be useful markers of OIS for further studies and that the larger set of OIS-SP genes will help yield insights into the biology underlying OIS and its role in cancer development.

Dataset composition
GEO datasets GSE59522, GSE54402, GSE60652, GSE19864 and GSE75207 provide data on H-RasG12V-induced senescence in IMR-90 fibroblasts (Table S5). GSE144397 provides time series data on Rafinduced senescence and H-RasG12V-induced senescence in WI-38 fibroblasts. GSE46801 observed B-RafV600E-induced OIS in melanocytes and GSE33710 observed H-RasG12V-induced senescence in WI-38 fibroblasts. GSE2487 provides data on MEK-induced senescent fibroblasts as well as fibroblasts that bypassed OIS. GSE144397 provides time series data of quiescent WI38 cells and GSE19864 data on quiescent IMR90 cells. Human Protein Atlas curated a list of 1903 genes that are predicted to secrete at least one protein product. The Cancer Genome Atlas (TCGA) patient tumor datasets (Version 2016_01_28) are from the Broad GDAC Firehose database (http://gdac.broadinstitute.org).

Data processing
Affymetrix raw data was preprocessed using robust multi-array average (RMA) (Hughey and Butte, 2015) while no further processing was applied to data from other platforms. The data was summarized at the gene level by mapping the probe set ID with highest interquartile range (IQR) to the gene. For batch effects removal and cross-study normalization, ComBat, an sva R package, was used (Leek et al., 2012).

5-way meta analysis
An ensemble of five univariate meta-analytic techniques was applied to the search for differentially expressed genes. The five methods include four parametric methods Stouffer (combined z-scores), Fisher (combined p-values), Fixed Effects Model (FEM), Random Effects Model (REM) (Choi et al., 2003) and one non-parametric method, RankProduct (RP) (Hong et al., 2006).

Machine learning pipeline
Meta-analysis was performed on OIS datasets and again on quiescent datasets. Human Protein Atlas was referenced for the list of 1903 human genes that are predicted to secrete protein products (Uhlén et al., 2019). Fuzzy clustering was performed on secretory genes within 3 time-series OIS datasets and within OIS bypass datasets. Fuzzy c-means clustering (FCM) is a type of soft clustering method that assigns a probability to the degree of membership of a data point to a cluster (Pal et al., 1996). In our model, it was performed using the cmeans: Fuzzy cmeans clustering function from R package e1071 to identify clusters of genes with similar patterns of expression changes. The value of parameter K, which is the number of clusters, was manually determined by the trend demonstrated by the graph of data. For the bypass, time-series and the rest of the datasets, K was set to 6, 4 and the default value, respectively. The resulting transcriptomic signature of OIS contains genes common to the outputs of all the 5-way meta-analysis of OIS datasets and the fuzzy clustering of time-series and bypass datasets, but excluding the genes associated with the 5-way analysis of quiescent data. The resulting gene set was further processed and validated through penalized regression (Zou and Hastie, 2005). We addressed the issue of high-dimension low-sample size through the use of a penalized regression that performs both shrinkage and variable selection to improve prediction accuracy. The regularization parameters were optimized by leave-one-study-out cross validation (LOSOCV), where one study is assigned to the validation set and the rest of the studies to the training set. All statistical analyses and calculations were executed through R version 3.2.3 (R Foundation for Statistical Computing Platform).

Data and code availability
The data sets and R code in this paper are publicly available online at https://github.com/malcogene/OIS_meta.