mRNAsi Index-based Integrative Analysis for the Identication of New Prognostic Models for Ovarian Cancer

Ovarian cancer (OC) has the highest mortality rate among all female reproductive system malignant tumors worldwide. In this study, we aimed to investigate OC from several perspectives by using machine learning. Our results showed that the mRNA expression-stemness index (mRNAsi) is closely related to clinical characteristics of OC patients, as OC patients with venous or lymphatic invasion had higher mRNAsi score compared to patients with no invasion. Furhter grade 3/4 patient group had higher mRNAsi scores compared to the grade1/2 group. We also found that mRNAsi is closely related to immune inltration in OC. We also built a competing endogenous RNA network, which contained 4 miRNAs, 5 lncRNAs, and 1 mRNA, by using Cytoscape based on the differentially expressed genes of the high- and low-mRNAsi groups. Through Lassio regression, we also established a model including 7 lncRNAs and 2miRNAs, which could effectively categorize OC patients into two groups based on the median risk score. We then developed a nomogram model which could effectively forecast the overall survival rate of OC for 1-, 3-, and 5-year period. The models assessed in this study showed potential for clinical application in treatment decisions for OC.


Introduction
Ovarian cancer (OC), a "silent killer", is the deadliest female reproductive system cancer [1,2] as well as the second most common gynecological malignant tumor in females over 40 worldwide. OC incidence in developing countries in Asia and the Middle East is a little lower than it in the developed countries in the North America and Europe [3]. At diagnosis, OC is usually almost at an advanced stage with poor prognosis. Over 70% of OC patients are at stage III or IV wat the time of diagnosis [2,4], and this is due to the absence of particular symptoms as well as early metastasis [5,6]. The overall long-term prognosis of advanced OC has remained poor over the last 30 years [2,7] despite the ongoing development of both surgical and chemotherapy treatment. The 5-year overall survival rate of patients diagnosed at stage I is 90%, at stage II-65%, at stage III-34%, and stage IV-15% [8]. In addition, OC has high recurrence, with 75% of patients with advanced OC experiencing a relapse within 3 years and follow-on chemoresistance making OC treatment even more di cult. Various research has described the signi cant role of cancer stem cells (CSCs) in the recurrence, metastasis, heterogeneity, and multidrug resistance of OC [9].
CSCs were rst identi ed and isolated from leukemia in 1994 [10,11]. Since then, they have been the subject of various investigations worldwide. CSCs are described as the "heart" of cancers, as they have the capacity of self-renewal and can differentiate into cancer cell lineages [12,13]. In 2005, Bapat and colleagues found that CSCs also exist in OC [14]. Subsequent studies revealed that CSCs have signi cant roles in the initiation, metastasis, and chemoresistance of OC. However, despite extensive studies, various aspects of these roles of CSCs remain unclear. Thus, a better understanding of the nature of OC stem cells could contribute to the development of new treatment strategies. Tumor-in ltrating immune cell (TIICs), as members of the tumor microenvironment, also play a critical role in the pathogenesis of OC [15,16]. In general, there are 22 functional immune cell types including monocytes, B cells naïve, M1 macrophages, CD8 T cells, natural killer cells (NK cells), and others. Whether there is a relationship between OC stemness and tumor immune cell in ltration remains ambiguous.
Machine learning can be applied not only for search engines and speech recognition [17], but also in molecular biology. Unsupervised learning, a kind of machine learning, is widely used as a bioinformatics method for medical data mining. The one-class logistic regression (OCLR), a type of unsupervised machine learning introduced in 2016 [18], is used to study the features of cell subtypes by deconvolution of high-throughput tumor biopsy data. The mRNA expression-stemness index (mRNAsi) quanti es the characterization of CSCs through the extraction of transcriptomic features from pluripotent stem cell samples via the OCLR algorithm. In general, higher mRNAsi scores indicate malignant biological processes and a poor prognosis. Novel biomarkers of OC stem cells can be identi ed for mRNAsi through the use of bioinformatics. In order to relate this to tumor immune in ltration, CIBERSORT algorithm can be employed to infer the relative proportions of different kinds of tumor-in ltrating immune cells.
Long non-coding RNAs (lncRNA) exert esstntial effects for the initiation and progression of tumors. lncRNA competes with endogenous RNAs through binding to microRNA (miRNA) and then regulates the expression of certain mRNA transcripts. Although there have been several studies ron the role of lncRNAs in OC [19][20][21], a competing endogenous RNA (ceRNA) network based on OC stemness has not yet been reported.
In the current study, OC cases downloaded from The Cancer Genome Atlas (TCGA) were divided into two groups, namely OC with a high and a low mRNAsi score (high-mRNAsi and low-mRNAsi, respectively). A ceRNA network was built on the basis of differentially expressed miRNAs, lncRNAs, and mRNAs selected by comparing the two groups (high-mRNAsi vs low-mRNAsi). The the correlation of copy number variation of differentially expressed mRNAs and the mRNAsi score was then analyzed. The prognostic value of differentially expressed miRNAs, lncRNAs, and mRNAs was assessed by utilizing univariate cox proportional hazards regression analysis (Cox PHR). The lncRNAs, miRNAs, and mRNAs associated with prognosis were tted into LASSO regression to establish the risk regression model. In addition, a nomogram model based on the selected prognosis-related genes was built to facilitate clinical application. The clinical value of the nomogram model was evaluated through a calibration curve, DCA curve, and clinical impact curve. The immune in ltration pro le of the OC data was also analyzed to further investigate the relationship between immune in ltration pro le and the mRNAsi of OC.

Data collection
Transcriptome pro ling data, copy number variation data, and clinical characteristic of OC cases were obtained from the TCGA database (https://portal.gdc.cancer.gov/). Transcriptome pro ling data included the expression of lncRNAs, miRNAs, and mRNAs. Clinical characteristic included survival time, life-state, age, stage, grade, venous invasion, and lymphatic invasion. Transcriptome pro ling lncRNAs and mRNAs data included 379 OC samples, and transcriptome pro ling miRNAs data included 499 samples.
Analysis of the relationship between mRNAsi and clinical characteristics mRNAsi quanti ed the characterization of CSCs through extraction of the transcriptomic features from pluripotential stem cell samples (embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs)) based on the same metnod of Malta et al through use of the OCLR algorithm [22]. The Wilcoxon rank-sum test was used to analyze the relationship between mRNAsi and clinical characteristic including age, stage, grade, venous invasion and lymphatic invasion. A Kaplan-Meier (K-M) curve was plotted to estimate the prognostic value of mRNAsi for OC.
Identi cation of differentially expressed, miRNAs, lncRNAs, and mRNAs related to mRNAsi Transcriptome pro ling data was divided into two groups (high-and low-mRNAsi group) on the basis median mRNAsi score. The "limma" package was used to screen for differentially expressed of miRNAs, lncRNAs, and mRNAs. The miRNAs, lncRNAs, and mRNAs with false discovery rate (FDR) < 0.05 and |log2 (fold change) | >1 were sconsidered statistically signi cant. Finally, a volcano plot was drawn to visualize results.
Construction of the lncRNA-miRNA-mRNA ceRNA network The selected differentially expressed miRNAs, lncRNAs, and mRNAs were used for the construction of a network based on Pearson correlation analysis. The network was visualized by using the Cytoscape software. Subsequently, we re ned and simpli ed the network diagram according to the principle of ceRNA (lncRNA and mRNA have a negative correlation with miRNA, lncRNA has a positive correlation with mRNA). This simpli ed network was considered as the nal ceRNA network.
Analysis of the correlation between the mRNAsi and copy number variation of differentially expressed mRNAs Classically, copy number variation (CNV) refers DNA fragments that have been ampli ed or reduced, resulting in genomic alterations with a size of 1kb or more [23]. Previous studies have reported that CNV of genes is related to the progression of various cancers through in uences on gene expression level [24,25]. Therefore, we used Pearson correlation analysis to investigate whether there was relationship between gene expression level, mRNAsi, and the CNV of differentially expressed mRNAs. Establishment of the prognostic risk score model based on prognosis-related genes through Lasso regression We selected survival-related genes from the differentially expressed miRNAs, lncRNAs, and mRNAs via Univariate Cox PHR. The selected survival-related genes were tted into Lasso regression to screen candidate genes for establishing a prognostic risk score model by using the "glmnet" package in R.
Multivariate Cox PHR was applied to establish the prognostic risk score model based on the regression coe cient of candidate genes through the formula: Risk score= exp(X1) * β1 + exp(X2) * β2 + . . . . . . + exp(Xn) * βn. Exp(X) refers to the expression of the candidate gene, and β represents the regression coe cient of candidate genes. The accuracy of the model was veri ed by K-M) curve and ROC curve analysis via the "survival" package.
Establishment of a nomogram model based on genes related to prognosis A nomogram model was built by utilizing the "rms" package in order to more intuitively predict the overall survival of OC at 1, 3, and 5 years based on the prognosis-related genes. The clinical impact and predictive power of the nomogram model were assessed by calibration curves, a DCA curve, and a clinical impact curve.

Analysis of the relationship between mRNAsi and tumor in ltrating immune cells
The relative ratio of 22 tumor-in ltrating immune cell types was calculated following the method of Yue Gao et al, utilizing the CIBERSORT algorithm [26]. The Wilcoxon rank-sum test was employed to distinguish differential in ltration of immune cells between the high-and low-mRNAsi groups. Principal component analysis (PCA) was used to investigate the grouped samples. The relationship of mRNAsi and tumor in ltrating immune cells was explored via Pearson correlation analysis.

Identi cation of correlations between mRNAsi and clinical characteristics
The mRNAsi score of 302 OC cases were obtained by using the OCLR algorithm, and sample data was divided into two groups based on the median mRNAsi score ( Figure 1A). As shown in Figure 1B and 1F, we found no signi cant correlation between mRNAsi and the age or stage of OC cases, respectively. OC patients with venous or lymphatic invasion had higher mRNAsi scores compared to patients with no invasion (Figure 1C, D). Further, grade 3/4 cases had higher mRNAsi score compared to grade 1/2 cases (p 0.024) ( Figure 1E). The Kaplan-Meier survival curve ( Figure 1G) indicated that mRNAsi had no signi cant prognostic value for OC.
Analysis of the relationship between CALB2 and CHN2 copy number variation and mRNAsi score Figure 3E depicts the locations of CALB2 and CHN2 on the chromosome. Initially, the relationship between the gene expression level and CNV of CALB2 and CHN2 were analyzed. There was a positive correlation between the gene expression level and CNV of CALB2 and CHN2, which was consistent with our assumption (Figure 3A, B). Transcriptomic data indicated that the expression of CALB2 and CHN2 was lower in the high-mRNAsi group than in the low-mRNAsi group, and, based on this, we predicted that the mRNAsi score may have a negative correlation with the CNV of CALB2 and CHN2.
Thus, we investigated the relationship between mRNAsi score and the CNV of CALB2 and CHN2. The result indicated that the CNV of CHN2 was negatively correlated with the mRNAsi score ( Figure 3C). However, there was no statistically signi cant correlation between the mRNAsi score and the CNV of CALB2 ( Figure 3D).

Establishment of the prognostic risk score model by Lasso regression
Univariate Cox PHR was used to identify survival-related lncRNAs, miRNAs, and mRNAs from differentially expressed genes based on mRNAsi score grouping. In general, 60 lncRNAs and 3 miRNAs (p<0.05) were obtained, as shown in Supplementary Table 2. To further narrow down candidate survivalrelated genes, 11 candidate genes, including 9 lncRNAs and 2 miRNAs, were obtained by using the Lasso binary logistic regression model ( Figure 4A, B). Regression analysis of multivariate Cox proportional hazards was then used to further narrow the scope of candidate genes. 9 candidate genes, including 7 lncRNAs and 2 miRNAs, were obtained and used to establish the risk score model. The risk score model was then built with the addition of expression level results and the regression coe cient of each gene in the Lasso regression model: risk score = 0.00014*ENSG00000230601+0.000213*ENSG00000275850+0.000174*ENSG00000225807-0.00052*ENSG00000234754+5.66E-05*ENSG00000259645+4.27E-05*ENSG00000227068 +4.85E-05*ENSG00000249706 -0.02221*hsa-miR-133a-3p+0.099624*hsa-miR-6788-3 (Supplementary Table 3). OC patients with lower risk scores had better OC outcome, as shown in Figure 4C. The ROC curve had an area of under curve of 0.658, which indicated that the prognostic risk score model possessed good predictive power for prognosis ( Figure 4D). The risk score of each OC patient was calculated based on expression of the 9 selected genes. The survival status, risk score distribution, and expression pattern of the 9 genes are depicted in Figure 4E-G. I have changed this so that the way you write out Lasso is consistent throughout the text.
Construction and assessment of a nomogram for overall survival prediction for OC A nomogram model based on the 9 candidate genes was established for the prediction of overall survival of OC at 1, 3, and 5 years ( Figure 5A). The model's calibration curve for the probability of overall survival at 1, 3, and 5 years had accurate predictive capacity ( Figure 5B-D). A DCA curve was constructed to assess the nomogram model's clinical bene ts. We found that the nomogram model could be of bene t for predicting the 1-, 3-and 5-year overall survival based on the red oblique line of the DCA curve ( Figure 5 E-G). Finally, a clinical impact curve was plotted to visualize the nomogram model's clinical potential. The predicted number of high-risk patients was more than the high-risk number with an adverse event at 1, 3, and 5 years, indicative of the predictive power of the nomogram model ( Figure 5 H-J).

Assessment of the relationship between mRNAsi and tumor-in ltrating immune cells
We calculated the relative ratio of 22 tumor in ltrating immune cell types by using the CIBERSORT algorithm. We obtained 21 tumor-in ltrating immune cell types, as there was no signi cant fraction of naïve CD4 T cells. CIBERSORT results are presented in Figure 6A. We then investigated differential immune cell in ltration between the high-and low-mRNAsi groups. As shown in Figure 6B, fractions of follicular T helper cells, M1 macrophages, and activated dendritic cells were much higher in the high-mRNAsi group compared to the low-mRNAsi group. However, the fraction of resting dendritic cells followed the opposite trend. The other 17 in ltrating immune cell types showed no signi cant difference. As revealed by PCA analysis results presented in Figure 6C, tumor-in ltrating immune cell fractions could effectively discriminate OC patients belonging to the high-mRNAsi group and low-mRNAsi group. We explored the relationship between mRNAsi score and immune in ltration by Pearson correlation analysis.
The result indicated a positive correlation between mRNAsi score and plasma cells, activated dendritic cells, and follicular T helper cells ( Figure 7A, E, F). Moreover, mRNAsi decreased in parallel with the increase of resting dendritic cells, as well as M2 macrophages and resting mast cells ( Figure 7B, C, D).

Discussion
OC is the most deadliest cancer of the female reproductive system [1], and is usually diagnosed at an advanced stage with poor prognosis and high chance of disease recurrence [5]. CSCs play an important part throughout cancer progression and recurrence, thus also affecting prognosis [9]. However, the exact role of OC stem cells remains unclear. Tumor-in ltrating cells also play crucial roles in the pathogenesis of OC. Whether there is a relationship there between OC stemness and tumor-in ltrating immune cells is currently unclear. lncRNAs play a signi cant part in the progression of malignant tumors. While studies have reported several lncRNAs with roles in OC pathogenesis, a ceRNA network based on OC stemness has not been previously reported. In order to better understand OC, elucidation of the previously mentioned unclear aspects is urgently needed. In this study, we utilized unsupervised learning in order to gain insights into the relationship between stemness, prognosis, and immune in ltration in OC.
Several studies have reported that mRNAsi could be used for cancer prognosis [27,28] and the exploration of novel cancer biomarkers [29,30]. Our results revealed that OC patients with venous or lymphatic invasion had higher mRNAsi scores compared to patients without invasion ( Figure 1C, D). In addition, grade 3/4 patients had higher mRNAsi scores compared to grade 1/2 patients ( Figure 1E).
However, no signi cant difference in prognosis was observed between the high-and the low-mRNAsi group, which is different from observations in other kinds of cancer [31]. This may due to differences between tumor types or limitations deriving from the current sample size, and requires further investigation. As lncRNAs participate in various aspects of the malignant tumor progression, we screened for differentially expressed lncRNAs, miRNAs, and mRNAs based on the mRNAsi grouping by using the "limma" package. Finally, a ceRNA network containing 5 lncRNAs (ENSG00000230333, ENSG00000227220, ENSG00000234323, ENSG00000233821, ENSG00000266411), 4 miRNAs (hsa-miR-141-5p, hsa-miR-19b-1-5p, hsa-miR-30e-5p, hsa-miR-s29a-5p), and 1 mRNA (CHN2) was constructed by using Cytoscape (Figure2E). We also found that the CNV of the differentially expressed gene CHN2, was negatively correlated with the mRNAsi score ( Figure 3C). CHN2 expression, the GTPase-activating protein of p21-rac, is expected to result in higher Rac activity and, therefore, may play a part in the progression from low-grade to high-grade tumors [32,33]. CALB2, also referred to as Calretinin, is involved in cancer progression, including that of OC. Rathore et al. [34] reported that Calretinin was expressed in 29/31 sex cord stromal tumors, and Yishu Yuan et al. reported that Calretinin and P53, together with PAX8, had a synergistic effect on the differentiation of epithelial OC cells [35] Univariate Cox PHR was used to identify survival-related lncRNAs, miRNAs, and mRNAs of the differentially expressed genes based on mRNAsi score grouping. We then established a model by Lasso regression, including 7 lncRNAs and 2 miRNAs, for predicting the prognosis of OC patients. The model could e ciently classify OC patients into two groups, a high-and a low-risk group ( Figure 4E-D). To further assess whether the 9 candidate genes could be utilized to predict the overall survival rate of OC at 1, 3, and 5 years, we developed a nomogram model. The nomogram model could remarkably predict overall survival of OC at 1, 3, and 5 years. To our best knowledge, such a 9-gene predictive model has not been reported previously. This model may be of use for the identi cation of novel prognostic markers for OC. However, it is still early to presume that the model is superior to established prognostic examinations. Thus, further experimental validation is necessary. As the nomogram only covered 9 genes, clinical characteristics, including age, lymphatic or/and venous invasion, were not included and may also require further investigation in relation to their prognostic value. Multicenter prospective cohort studies with-large sample sizes are required before clinical implementation, as our research was only based on TCGA data with a relatively small sample size.
As various clinical trials of immunotherapy for OC are currently underway [36], we evaluated the potential relationship between mRNAsi and tumor-in ltrating immune cells on the basis of mRNAsi score grouping. Our results showed the fractions of follicular T helper cells, M1 macrophage, and activated dendritic cells were higher in the high-miRNAsi group compared to the low-mRNAsi group. The opposite was true for the fraction of resting dendritic cells. Through Pearson correlation analysis, we found a positive correlation between mRNAsi score and activated dendritic cells, plasma cells, and follicular T helper cells ( Figure 7A, E, F, Supplementary Table 4). Analysis also revealed a negative correlation not only between mRNAsi score and resting dendritic and mast cells, but also between mRNAsi score and M2 macrophages ( Figure  7B, C, D, Supplementary Table 4). Different tumor in ltrating immune cell types have various roles in cancer, as in the case of cancer-promoting M2 macrophages [37] as well as mast cells which not only enhance proliferation, but also improve the survival of naïve B and activated B cells [38]. This may partially explain why no strong correlation was found between mRNAsi score and OC prognosis.

Conclusion
The present study revealed that the mRNAsi score is closely related to immune in ltration in OC. We proposed a prognostic risk model established through Lasso regression, including 7 lncRNAs and 2 miRNAs, which had considerable predictive ability. We also developed a nomogram model, which could remarkably predict overall survival of OC after 1, 3, and 5 years. These models demonstrated potential for clinical application as part of the treatment process for OC.