Recurrence-Associated Multi-RNA Signature to Predict Disease-Free Survival for Ovarian Cancer Patients

Ovarian cancer (OvCa) is an intractable gynecological malignancy due to the high recurrence rate. Several molecular biomarkers have been previously screened for early identifying patients with a high recurrence risk and poor prognosis. However, all the known studies focused on a single type of RNAs, not integrating various types. This study was to construct a new multi-RNA-based model to predict the recurrence and prognosis for OvCa patients by using the messenger RNA (mRNA, including long noncoding RNA (lncRNA)) and microRNA (miRNA) sequencing data of The Cancer Genome Atlas database. After univariate Cox regression and least absolute shrinkage and selection operator analyses, a multi-RNA-based signature (2 miRNAs: hsa-miR-508, hsa-miR-506; 1 lncRNA: TM4SF1-AS1; 11 mRNAs: MAGI3, SLAMF7, GLI2, PDK1, ARID3A, PLEKHG4B, TNFAIP8L3, C1QTNF3, NDUFAF1, CH25H, TMEM129) was generated and used to establish a risk score model. The high- and low-risk patients classified by the median risk score exhibited significantly different recurrence risks (89% versus 61%, p < 0.001) and survival time (the area under the receiver operating characteristic curve (AUC) = 0.901 for 5-year disease-free survival (DFS)). This risk model was independent of other clinical features and superior to pathologic staging for DFS prediction (AUC, 0.906 versus 0.524; C-index, 0.633 versus 0.510). Furthermore, some new interaction axes were revealed to explain the possible functions of these RNAs (competing endogenous RNA: TM4SF1-AS1-miR-186-STEAP2, LINC00536-miR-508-STEAP2, LINC00475-miR-506-TMEM129; coexpression: LINC00598-PLEKHG4B). In conclusion, this multi-RNA-based risk model may be clinically useful to stratify OvCa patients with different recurrence risks and survival outcomes and included RNAs may be potential therapeutic targets.


Introduction
Ovarian cancer (OvCa) is one of the frequently diagnosed, lethal malignancies in the female genital system worldwide. It was estimated that there were 22,240 new cases and 14,070 deaths in the United States in 2018 [1] while 52,100 new cases and 22,500 mortalities were reported in China in 2015 [2]. Surgical resection combined with chemotherapy is the firstline treatment for OvCa, which has been demonstrated to improve the prognosis of patients. However, the overall fiveyear survival rate remains lower (approximately 30%) [3,4] because more than 70% of patients [5] may experience recurrence, ultimately resulting in the treatment failure. erefore, early identification of patients with a high recurrence risk is necessary in order to schedule personalized therapies and improve prognostic outcomes.
Recently, with the rapid developments in molecular biology, several scholars devote themselves to identify molecular biomarkers for prediction of recurrence and survival for OvCa patients. For example, Yang et al. used the microarray data of GSE9891 and GSE30161 retrieved from the Gene Expression Omnibus (GEO) database and LASSO (least absolute shrinkage and selection operator) penalized regression analysis to identify a six-long noncoding RNA (lncRNA: including RUNX1-IT1, MALAT1, H19, HOTAIRM1, LOC100190986, and AL132709.8) recurrence signature.
is signature was found to predict the disease-free survival (DFS) for OvCa patients, with the area under the receiver operating characteristic (ROC) curve (AUC) of 0.813 (GSE9891, training), 0.697 (GSE9891, internal validation), and 0.711 (GSE30161, external validation) [6]. Dong et al. applied the support vector machine (SVM) method to screen a 19-microRNA (miRNA) signature that could well distinguish the recurrent from the nonrecurrent samples. Six of them (miR-193b, miR-211, miR-218, miR-505, miR-508, and miR-514) were found to be independently related to prognosis and were used to establish a risk score which was proved to significantly discriminate the recurrence-free survival (RFS) between the high-risk and lowrisk groups (AUC � 0.961 for e Cancer Genome Atlas (TCGA) sequencing dataset; AUC � 0.922 for GSE25204 microarray dataset; AUC � 0.921 for GSE27290 microarray dataset) [7]. Zhou et al. also utilized the SVM algorithm to screen 39 feature genes. ey distinguished the recurrence samples from nonrecurrence samples, with the prediction accuracies of 92.7%, 93.3%, 96.6%, and 90.4% for GSE17260, GSE44104, GSE51088, and TCGA datasets, respectively. e Kaplan-Meier (K-M) survival curve revealed that the survival time of patients with predicted nonrecurrent OvCa was significantly longer than that of patients with predicted recurrent OvCa [8]. Zhao et al. screened KCNN4 and S100A14 combination as the perfect recurrence prediction model (AUC � 0.5442-0.9524) [9]. However, all these studies focused on the prognostic values of single RNA type, not a combination of them [10]. e study on colorectal cancer indicated that a multi-RNA-based classifier model (AUC � 0.83) seemed to be more effective in stratifying patients who are at a high risk of mortality than lncRNAs (AUC � 0.56, p < 0.001), miRNAs (AUC � 0.67, p � 0.0291), or mRNAs alone (AUC � 0.76, p � 0.0051) [11]. erefore, it is of clinical significance to develop a multi-RNA-based prognostic signature for OvCa.
In this study, we aimed to collect the lncRNA, miRNA, and mRNA expression data from the TCGA database and construct a multi-RNA-based risk score model to predict the recurrence and DFS for OvCa patients. In addition, the underlying functions of lncRNAs, miRNAs, and mRNAs were predicted by constructing the coexpression, competing endogenous RNA (ceRNA) [12], and protein-protein interaction (PPI) [8] networks, which was also not simultaneously analyzed in previous studies on the recurrence signature of OvCa. Accordingly, we hypothesize that our study may provide a more effective, function-clear signature for the prediction of recurrent prognosis in OvCa patients.

Patient Datasets.
e mRNA (including lncRNA, n � 379) and miRNA (n � 498) expression data (level 3) with their clinical information were collected by searching the TCGA database portal (https://portal.gdc.cancer.gov/) on August 6, 2019, with the keyword of ovarian cancer. ese expression profiles were obtained by sequencing on the Illumina HiSeq 2000 platform. A total of 322 samples were retained for the following analysis because they satisfied the following inclusion criteria: (1) mRNAs and miRNAs were both sequenced; (2) information of recurrence (n � 80, nonrecurrent; n � 242, recurrent) and survival status was clearly described. Due to the fact that no other datasets simultaneously met the above two inclusion criteria, these 322 patients, obtained from the TCGA database, were randomly divided to training and validation sets to achieve internal validation. eir detailed clinical characteristics are presented in Table 1.

Construction of Multi-RNA-Based Prognostic Signature.
Univariate Cox regression analysis was first performed to preliminarily screen the DEGs, DELs, and DEMs that were associated with DFS using the "survival" package in R (version, 2.41-1; http://bioconductor.org/packages/ survivalr/). Subsequently, a multivariate Cox regression analysis was conducted to confirm their independence. Logrank p value < 0.05 was set as the statistical threshold for these two analyses. In order to further optimize the prognostic signature, an L1-penalized estimation-based Cox proportional hazards regression model (that is, LASSO) in the penalized package (version, 0.9-5; http://bioconductor. org/packages/penalized/) [15,16] was applied for the independent prognostic DERs, in which the value of penalty parameter lambda was chosen via cross-validation likelihood routine (1000 iterations). en, the prognostic risk score was established according to the expression levels of the RNAs (Exp DERs ) and prognostic coefficients ( β DERs ): Prognostic risk score � β DERs × Exp DERs . (1)

Assessment of the Prognostic Performance of the Risk Score
Model. Using the median risk score as the cutoff point, the OvCa patients were classified into high-risk and low-risk groups. K-M survival curve was drawn by using the "survival" package in R to observe the DFS differences between high-and low-risk groups. Furthermore, the ROC curves with the calculation of AUC were also plotted using the pROC in R (version, 1.14.0; https://cran.r-project.org/web/ packages/pROC/index.html) to estimate the prediction accuracy of this risk score for the 1-, 3-, and 5-year survival rate of high-and low-risk patients. ese analyses were performed for each dataset, respectively.
To verify that the multi-RNA signature was an independent prognostic factor, univariate and multivariate Cox regression analyses were performed for DFS using the risk score and several clinical features (including age, neoplasm histologic grade, pathologic stage, lymphovascular invasion, and vascular invasion) in each dataset. e superiority of the risk score model to clinical factors in survival prediction was also validated by K-M curve analysis for subgroups, calculation of AUC, and Harrell's concordance index (C-index) for each classifier. C-index was calculated using survcomp package in R (http://www. bioconductor.org/packages/release/bioc/html/survcomp. html) [17].

Function Prediction of the Prognostic RNAs.
Coexpression, ceRNA, and PPI networks were constructed to predict the possible functions of signature lncRNAs, miRNAs, and mRNAs, respectively. Pearson correlation coefficients (PCC) between DELs and DEGs were calculated to represent their possible coexpression relationships using cor.test function (https://stat.ethz.ch/R-manual/Rdevel/library/stats/html/cor.test.html) in R. Potential interactions between DEMs and DELs as well as between DEMs and DEGs were predicted using DIANA-LncBase (version, 2.0; http://carolina.imis.athena-innovation.gr/ diana_tools/web/index.php?r�lncbasev2/index-predicted) [18] and starBase database (version, 2.0; http://starbase. sysu.edu.cn/) [19], respectively. Only the DEL-DEM and DEM-DEG interactors that had an opposite expression trend were used to construct the ceRNA network [20]. e interactions between DEGs were predicted by mapping the genes to the Search Tool for the Retrieval of Interacting Genes (STRING; version, 10.0; http://string-db.org/) database [21]. PPI network was constructed based on interaction pairs with a combined score >0.4. All networks were visualized using Cytoscape (version 3.6.1; http://www. cytoscape.org/). Function enrichment analysis was performed for the genes in all three networks using the Enrichr online tool (http://amp.pharm.mssm.edu/Enrichr/) [22], in which Kyoto Encyclopedia of Genes and Genomes (KEGG), Bio-Carta, Reactome Pathways and Gene Ontology (GO) biological process (BP), and molecular function (MF) terms could be obtained. A p value < 0.05 was considered statistically significant.

Differential Expression Analysis.
A total of 12,728 mRNAs, 1,214 lncRNAs, and 547 miRNAs were annotated based on HGNC. Under the threshold of |log2FC| > 0.5 and FDR < 0.05, 540 of them were screened as DERs between recurrent and nonrecurrent samples (Figure 1(a)), including 451 DEGs, 68 DELs, and 21 DEMs (Figure 1(b)). e heat map showed that these DERs can obviously distinguish the recurrent from the nonrecurrent samples (Figure 1(c)).

Assessment of the Prognostic Performance of is Signature.
e risk score was calculated for each patient in each dataset (Supplemental Information 1), and the patients in each dataset were divided into a high-risk group and lowrisk group according to their median risk score. As shown in Figure 2(b), the patients with high-risk scores were shown to be at a high risk of recurrence (72/81, 89% versus 49/80, 61%; Chi-square � 16.5, p < 0.001) and had shorter DFS than those with the low-risk scores (Figure 2(c)). Also, patients with high-risk scores tended to express oncogenic RNAs, whereas tumor suppressor RNAs inclined to be highly expressed in patients with low-risk scores (Figure 2(d)).
To further confirm the prognostic performance of this signature, the K-M survival curve was drawn for the training (Figure 3(a)), validation (Figure 3(c)), and entire   (Table 3). Furthermore, the pathologic stage was also found to be an independent prognostic factor for OvCa patients, with a higher stage having poor DFS (Table 3; Figure 4). To investigate whether our multi-RNA-based signature added additional prognostic values to the commonly used pathologic stage, stratification analyses were also conducted. e results showed this multi-RNA-based signature can further distinguish the survival of patients at the same stage 3 (Figure 4). e survival of patients with stages 2 and 4 not significantly separated by this multi-RNA-based signature may be related to small sample size in these subgroups. e superiority of the risk score model to clinical factors in survival prediction was also validated by comparison of AUC and C-index for each classifier. As anticipated, the results revealed that the AUC of the multi-RNA-based model was relatively higher than that of the pathologic stage and single RNA, but nearly similar to the combined model using the multi-RNA and pathologic stage ( Figure 5). ese findings implied that our multi-RNA based signature may be a promising biomarker for clinical use to predict the DFS of OvCa patients.

Functional Analysis for Prognostic Signature.
To explore the underlying molecular mechanisms of these 14 RNAs in the signature, coexpression network between DELs and DEGs, ceRNA network among DELs, DEMs, and DEGs, and PPI networks between DEGs were constructed.
A total of 38 DELs were predicted to interact with 18 DEMs by the DIANA-LncBase database, which formed 90 interaction pairs, while 13 DEMs were predicted to interact with 66 DEGs by the StarBase database to constitute 104 interaction pairs. ese DEL-DEM and DEM-DEG interaction pairs were integrated to construct the ceRNA network (Figure 7), in which prognostic signature lncRNA TM4SF1-AS1 may function as a ceRNA for miR-186 to regulate STEAP2; prognostic signature miR-508 and miR-506 related ceRNA axes were LINC00536-miR-508-STEAP2 and LINC00475-miR-506-TMEM129, respectively. Function enrichment analysis showed that STEAP2 was involved in mineral absorption, transmembrane transport of small  (Table 5). A total of 265 PPI pairs were predicted between 223 DEGs using the STRING database, which was used to construct the PPI network ( Figure 8). In this network, prognostic GLI2, ARID3A, MAGI3, PDK1, and TMEM129 were included. Function enrichment analysis showed that GLI2 was involved in Hedgehog signaling pathway, Basal cell carcinoma, Hippo signaling pathway, pathways in cancer, and regulation of transcription, DNA-templated (GO:0006355); ARID3A was enriched in transcription regulatory region sequence-specific DNA binding (GO:0000976) and RNA polymerase II regulatory region sequence-specific DNA binding (GO:0000977); MAGI3 could interact with TNK2 to participate in positive regulation of protein phosphorylation (GO:0001934) and growth factor receptor binding (GO:0070851) ( Table 6). Additionally, PDK1 could interact with PFKFB4 and thus may take part in the GO terms as described in the function enrichment results of the ceRNA network.

Discussion
In the present study, we, for the first time, constructed a multi-RNA-based signature (consisting of 11 mRNAs, 2 miRNAs, and 1 lncRNA) to predict the recurrence and DFS for OvCa patients. e high-and low-risk patients classified by the median risk score were shown to exhibit significantly different recurrence risks and DFS time. e AUC was 0.975, 0.912, and 0.901 for prediction of 1-, 3-and 5-year DFS in the training set, respectively, which seemed to be higher compared with other single RNA signatures reported previously, including Yang et al.  [9]. is superiority of multi-RNA risk score to single RNA was also validated in our study according to the AUC (0.906 versus 0.621 for lncRNA, 0.666 for miRNAs, and 0.843 for mRNAs) and C-index results (0.633 versus 0.502 for lncRNA, 0.545 for miRNAs, and 0.618 for mRNAs) and was in line with the study of Xiong et al. [11].
Clinically, the pathologic staging is commonly considered as the gold standard for the assessment of the recurrence risk [24] and prognosis [25] of OvCa patients. is conclusion was also confirmed in our study, showing that pathologic staging was an independent factor for the prediction of DFS (that is, the prognosis was the worst in stage 4) after multivariate Cox regression analysis. However, some studies revealed that patients at the same pathological stage also had different recurrence risks and prognostic outcomes [26,27]. us, there is a strong need to explore more effective prognostic biomarkers to independently or jointly identify the recurrence risk and prognosis of patients. Recently, several studies on other cancers had demonstrated that the risk score established by molecular signature had a similar or even higher accuracy for prognostic prediction than clinical features and the AUC (and/or C-index) was the largest for the combination of them [11,28,29]. Hereby, we also calculated the AUC (and/or C-index) of our multi-RNA-based risk score and independent pathologic stage factor. In accordance with previous studies [11,28,29], we  BioMed Research International also found that the AUC (and/or C-index) of our multi-RNA-based risk score was relatively higher than that of the pathologic stage, and the predicted and actually observed survival was significantly correlated using the risk score (p � 4.674E − 07), but not pathologic stage (p � 6.886E − 01). e stratification analysis also showed that the risk score can further distinguish the survival of patients at the same stage 3. ese findings implied the insufficiency to use pathologic staging for prognosis prediction in the clinic and the advantage of our risk score. Of course, the optimum was still the combined model, with the AUC and C-index of 0.913 and 0.634, respectively.
ere was no study to investigate the roles of lncRNA TM4SF1-AS1 and it may be a novel signature gene for cancer. However, we can indirectly speculate its functions by its interacted miR-186-STEAP2 axis. Similar to mRNAs, lncRNAs also harbor miRNA-response elements (MREs) and thus they can completely bind to miRNAs, leading to fewer miRNAs interacted with mRNAs, while it is well known that miRNAs can negatively regulate gene expression by interfering their translation or stability (ceRNA hypothesis). Accordingly, the upregulation of TM4SF1-AS1 identified in our recurrent samples may result in the overexpression of STEAP2 and possible lower expression of  miR-186 in OvCa. is theory was validated in our study and previous studies: extensive evidence had suggested that miR-186 was a tumor suppressor gene, with a downregulated expression level in cancer tissues, cells [30,31], and blood of recurrent oral squamous cell carcinoma patients compared with controls [32], including OvCa [33]. Overexpression of miR-186 into cancer cells significantly inhibited proliferation, invasion, metastasis [29,34,35], and induced mesenchymal-to-epithelial transition, G1 cell cycle arrest, and cell apoptosis [33]. STEAP2 (Seven Transmembrane Epithelial Antigen of the Prostate 2) is a gene primarily expressed in the prostate and the ovary is the only other area with a significant expression [36]. e roles of STEAP2 in OvCa may be similar to prostate cancer in which STEAP2 was observed to be upregulated and associated with advanced stage and Gleason score [37,38]. e overexpression of STEAP2 induced the normal prostate cells to gain an ability to migrate and invade [36] while the knockdown of STEAP2 significantly reduced proliferation and migration of prostate cancer cells [39]. However, no study was performed to verify the interaction between miR-186-STEAP2 and TM4SF1-AS1, which may be the direction of our future search.
In addition to miR-186, we found that prognostic miR-508 also targeted STEAP2 and, hereby, miR-508 may be possibly downregulated in OvCa, which was also proved in our study and another literature [40]. Transfection of cancer cells with miR-508 mimics significantly suppressed cell proliferation, migration, and invasion while the use of miR-508 inhibitors resulted in an opposite trend [40,41]. More interestingly, LINC00536 may be an upstream gene to influence the interaction between miR-508 and STEAP2 by acting as a ceRNA in the present study. Although their interaction was not reported, the opposed expression trend and roles of LINC00536 indirectly supported its potential relationship with miR-508. LINC00536 was identified to be highly expressed in bladder cancer tissues compared with controls and negatively associated with patients' survival rate. Function assays indicated that the knockdown of LINC00536 attenuated the malignant cell phenotypes in vitro and inhibited bladder cancer growth in vivo [42].
e present study showed that prognostic miR-506 was involved in OvCa by impacting the LINC00475-miR-506-TMEM129 ceRNA axis. is was also a novel mechanism explained for OvCa because no study reported their interactions previously, except that the expression level of each gene was preliminarily investigated as follows: Nam et al. revealed that the expression of miR-506-3p was significantly decreased in recurrent samples of OvCa patients compared with normal ovarian tissue and primary tumors [43]. e high level of miR-506 was positively associated with the early FIGO stage and longer survival in OvCa [44]. e introduction of miR-506 in OvCa cells can inhibit its proliferation and dissemination and promote senescence [44,45]. Mo et al. used the lncRNA microarray to identify the high expression of LINC00475 in gastric cancer tissues compared with paired nontumor tissues [46]. e study of Hou et al. revealed that LINC00475 was associated with the overall survival of patients with clear cell renal cell carcinoma [47]. Silencing of LINC00475 suppressed cell proliferation, migration, and invasion in renal cell carcinoma [48] and glioma cells [49] in vitro. TMEM129 encodes an E3 ubiquitin ligase that was reported to mediate endoplasmic reticulum-associated HLA class I degradation [50] while tumors with downregulation of MHC class I conferred a significantly poorer prognosis in OvCa patients [51].
ese findings suggested that TMEM129 may be upregulated in OvCa. In line with these studies, we also found that miR-506 was downregulated and LINC00475 and TMEM129 were upregulated in recurrent samples compared with nonrecurrent controls.
As for the prognostic genes not explored in the ceRNA network, most of them had been shown to be associated with the development and progression of OvCa or other cancers previously in the literature. For example, C1QTNF3 (C1q and TNF related 3) was found to be upregulated in bowel metastasis samples than primary OvCa [52]. ARID3A (ATrich interactive domain 3A) positivity was implied to be correlated with longer DFS and cancer-specific survival in patients with residual rectal cancer [53]. MAGI3 (PDZ domain-containing protein membrane-associated guanylate kinase inverted 3) was identified to be downregulated in glioma samples. Overexpression of MAGI3 inhibited proliferation, migration, and cell cycle progression of glioma cells and decreased subcutaneous tumor growth in mice by inactivation of Wnt/β-catenin signaling pathway. High expression of MAGI3 was also significantly associated with excellent overall survival [54]. TNFAIP8L3 (TNF alphainduced protein 8 like 3, also known as TIPE3), a transfer protein of phosphoinositide second messengers, was detected to be significantly upregulated in breast cancer tissues (especially invasive or metastasized type) as compared with adjacent nontumor tissues. Inhibition of TNFAIP8L3 may block cancer cell proliferation, migration, and invasion in vitro and in vivo by inactivation of the AKT and NF-κB signaling pathways [55,56]. In consistence with these studies, we also found that C1QTNF3 and TNFAIP8L3 were upregulated, while ARID3A and MAGI3 were downregulated in recurrent samples, and they were respectively an oncogenic or tumor suppressor gene for prediction of DFS. Although there were also studies to confirm the roles of Gli2 (Glioma-associated oncogene family member 2) [57], PDK1 (pyruvate dehydrogenase kinase) [58], NDUFAF1 (NADH dehydrogenase 1 alpha subcomplex assembly factor 1) [59], SLAMF7 (SLAM family member 7) [60], and CH25H (cholesterol 25-hydroxylase) [61], their conclusions seemed to be opposite with our study, which may be attributed to the differences in cancer type or sample size and thus further validation experiments are needed. PLEKHG4B was not investigated in any study of cancer, but we speculated that its expression may be positively regulated by LINC00598 because of their PCC of 0.43. Knockdown of LINC00598 was       previously proved to induce G0/G1 cell cycle arrest and inhibit proliferation, indicating its tumor-promoting roles [62]. Accordingly, we speculated that PLEKHG4B was also highly expressed in OvCa, which was verified in our study as expected.

Conclusion
In the present study, we used the recurrence-associated genes to construct a multi-RNA-based risk score model which was shown to effectively stratify OvCa patients into groups at low risk and high risk of shorter DFS. is model was not only independent of clinical features but also superior to commonly used pathologic staging in the clinic for prognostic prediction. Furthermore, we predicted several new interaction axes to explain the possible mechanisms of these RNAs in OvCa, such as TM4SF1-AS1-miR-186-STEAP2, LINC00536-miR-508-STEAP2, LINC00475-miR-506-TMEM129, and LINC00598-PLEKHG4B. However, some limitations still exist. First, quantitative PCR experiments should be performed to validate the expression and prognosis accuracy of our signature genes in clinical samples because there may be potential differences with TCGA sequencing data results as reported for Gli2, PDK1, NDU-FAF1, SLAMF7, and CH25H. Second, in vitro and in vivo experiments are also essential to validate the ceRNA or coexpression mechanisms we identified.

Data Availability
All data can be available in TCGA database (https://portal. gdc.cancer.gov/).

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Yu Zhang and Qingjian Ye contributed equally to this work. YZ, QJY, YBY, and XML conceived the study and participated in its design. YZ and QJY collected the data and performed bioinformatic analyses. JXH, PGC, JW, and JL searched the references and were involved in the interpretation of the results. YZ and QJY drafted the manuscript. YBY and XML revised the manuscript. All authors have read and approved the final version of the article.