Stem Cell-Associated Signatures Help to Predict Diagnosis and Prognosis in Ovarian Serous Cystadenocarcinoma

Ovarian serous cystadenocarcinoma (OV) is a fatal gynecologic cancer with a five-year survival rate of only 46%. Resistance to platinum-based chemotherapy is a prevalent factor in OV patients, leading to increased mortality. The platinum resistance in OV is driven by transcriptome heterogeneity and tumor heterogeneity. Studies have indicated that ovarian cancer stem cells (OCSCs), which are chemoresistant and help in disease recurrence, are enriched by platinum-based chemotherapy. Stem cells have a significant influence on the OV progression and prognosis of OV patients and are key pathology mediators of OV. However, the molecular mechanisms and targets of OV have not yet been fully understood. In this study, systematic research based on the TCGA-OV dataset was conducted for the identification and construction of key stem cell-related diagnostic and prognostic models for the development of multigene markers of OV. A six-gene diagnostic and prognostic model (C19orf33, CBX2, CSMD1, INSRR, PRLR, and SLC38A4) was developed based on the differentially expressed stem cell-related gene model, which can act as a potent diagnostic biomarker and can characterize the clinicopathological properties of OV. The key genes related to stem cells were identified by screening the genes differentially expressed in OV and control samples. The mRNA-miRNA-TF molecular network for the six-gene model was constructed, and the potential biological significance of this molecular model and its impact on the infiltration of immune cells in the OV tumor microenvironment were elucidated. The differences in immune infiltration and stem cell-related biological processes were determined using gene set variation analysis (GSVA) and single-sample gene set enrichment analysis (ssGSEA) for the selection of molecular treatment options and providing a reference for elucidating the posttranscriptional regulatory mechanisms in OV.


Introduction
Ovarian serous cystadenocarcinoma (OV) is a fatal gynecologic cancer with a five-year survival rate of only 46% [1,2]. Most OV patients are resistant to platinum-based chemotherapy, resulting in increased mortality. Platinum resistance in OV is driven by transcriptome and tumor heterogeneity [3]. A previous study has shown that ovarian cancer stem cells (OCSCs), which are chemoresistant and responsible for disease recurrence and relapse, are enriched by platinum-based chemotherapy [4]. Stem cells are critically involved in the prognosis of OV patients and are key pathology mediators of OV [5]. However, the molecular mechanisms and targets of how stem cell-associated genes affect ovarian serous cystadenocarcinoma have not been fully elucidated. In the last few years, several research studies have proven that many proteins, different dysregulated genes, and some other molecular substances in OV may serve as important diagnostic markers and treatment targets. Considering the key role of stem cells in regulating pathological changes in OV, investigating the clinical and biological significance of stem cell-associated genes in OV may also lead to the advancements in OV molecular diagnosis and antitumor therapy.
This study is aimed at identifying and assessing the multigene markers for molecular diagnosis of OV based on the systematic study of differential stem cell-associated genes in OV. Moreover, the prognostic impact of gene models on OV patients and the underlying molecular mechanisms of immune microenvironment infiltration were also analyzed. Further, this study revealed the underlying functional and molecular pathways of immune infiltration and stem cellassociated gene models based on the CIBERSORT algorithm and GSVA. The stem cell-associated gene diagnostic features were established to help in OV diagnosis and investigate the potential biological and prognostic significance of stem cells by applying the LASSO regression analysis. In addition, a TF-miRNA molecular network targeting the 6-gene model was also constructed to investigate the regulatory mechanism of posttranscriptional and molecular mechanisms underlying OV.

Identification of Stem Cell-Related Marker Models.
In the TCGA-OV dataset, the differentially expressed genes (DEGs) between OV and healthy control tissue samples were identified based on the absolute value of log2Fold-Change of more than 0.5 and a false discovery rate (FDR) of less than 0.05 [6]. The stem cell-related biological properties in OVs were examined by using "stem cells" as the keyword and "C7 IMMUNESIGDB" as the filter condition in the molecular signature database (MSigDB, http://www .gseamsigdb.org/gsea/msigdb/ index. jsp), and 26 stem cellrelated gene sets were obtained (Table 1). 200 stem cell-   3 Stem Cells International related markers were identified in the GSE23321 dataset using comparative screening [7].

Construction of Stem Cell-Based Diagnosis and Prognosis
Model of OV. As a result of the differences in the influence of the stem cell-associated molecular mechanisms, the healthy control and OV patient samples may have different stem cell states. Therefore, it is extremely feasible to construct diagnostic models based on necroptosis-associated genes. The stem cell-related biological properties in OVs were examined by using "stem cells" as the keyword and "C7 IMMUNE-SIGDB" [6,8] as the filter condition in MSigDB [6,7], and 26 stem cell-related gene sets were obtained. 200 stem cellrelated markers were identified in the GSE23321 dataset using comparative screening [7].

Molecular Diagnostic Efficacy Evaluation.
Single-gene receiver operating characteristic (ROC) curves were plotted using the R package pROC to confirm the robustness of the diagnostic model and to calculate the area under the curve (AUC) [9]. Subsequently, the diagnostic performance of the model based on clinical characteristics such as sex, tumor stage, age, grade, metastatic status, DSS (disease-specific survival), OS (overall survival), and PFI (progressionfree interval) was evaluated, revealing the association of six stem cell-related marker models in OV with OV prognosis. Time-dependent ROC curves were constructed using the R package rms for the three genes with the most significant weights obtained from the LASSO model, taking into account age and sex information and then visualized using nomograms. The nomogram was validated by measuring the calibration curves using an internal dataset (training set) and an external dataset (validation set) [10].

Gene Enrichment
Analysis. Gene Ontology (GO) enrichment analysis was used for the comprehensive functional enrichment study of genes at different levels in order to gain insight into the biological significance of the six stem cellrelated marker models in OV [11]. The GO analysis was conducted at three levels, namely, cellular component (CC), biological process (BP), and molecular function (MF), at which the analysis was done. The renowned Kyoto Gene and Genome Encyclopedia Genomes (KEGG) database provides information on biological processes, genomes, diseases, and drugs [12]. All the significant DEGs were annotated with the GO function using the R package cluster-Profiler to identify highly enriched biological processes. The enrichment results were visualized using the R package GO plot, and p < 0:05 was considered as the significance threshold for enrichment analysis.         7 Stem Cells International under immune cell infiltration in OV was investigated by using the CIBERSORT algorithm for deconvolution of the transcriptome expression data based on the principle of linear support vector regression and estimation of the composition and abundance of immune cells from the mixed cell population [14]. The samples with p < 0:05 were filtered out to derive an immune cell infiltration matrix [15,16].
2.6. Gene Set Variation Analysis. GSVA [17] is both a nonparameterized and unsupervised method for gene set analysis that uses transcriptome data for predicting the scores of particular pathways or signatures. GSVA is also used for predicting variations in gene set enrichment using expression datasets. GSVA converts the relevant data from a gene-sample matrix to a gene-sample matrix, allowing the pathway enrichment of each sample to be evaluated. GSVA enables the pathway-centric use of standard research techniques, including survival analysis, cluster analysis, and functional enrichment. At the same time, GSVA was per-formed between groups using the background set based on "msigdb.v7.0.symbols.gmt" and the phenotype characteristic gene set [18].

2.7.
Quantitative RT-PCR. Total RNA was isolated from normal tissue and tumor tissue from ovarian serous cystadenocarcinoma patients using TRIzol reagent (Beyotime, China). Then, the RNA from each sample (2 μg) was obtained. Subsequently, PCR reactions were performed on Roche LightCycler 480 PCR system (Roche) using SYBR Green Master (Roche). CT values were calculated for all samples using the 2-ΔΔCT method and normalized using the levels of (GAPDH). The primer pairs for the target genes were the following: C19orf33: forward primer TTACCGCCATGGAGTT CGAC and reverse primer CCCTGAAGTTGGAGGCCTTT SLC38A4: forward primer CCCCACTCACACAGAA CAGAG and reverse primer CAGCGCTTTCTTGTCC ACAC  Mann-Whitney U test (i.e., the Wilcoxon rank sum test) was conducted to assess variance between nonnormally distributed variables, and the independent Student's t test was used to identify the significance of statistics for normally distributed variables and to compare two consecutive datasets. The chisquare test or Fisher's exact test was used for comparing and assessing the statistical significance between two sets of categorical variables. The Kruskal-Wallis test was used to compare two or more groups, while the Wilcoxon test was used to compare two groups. The pROC package in R was used to plot ROC curves and obtain AUC to determine the accuracy of the risk scores in determining patient prognosis. All the statistical p values were two-sided, and p < 0:05 was considered statistically significant.

Analysis of Hub Stem Cell-Related Differential Genes.
The DEGs (jlogFCj > 0:5 and Padj < 0:05) were initially ana-lyzed in tumor and normal tissues in the TCGA-OV database, which were demonstrated by volcano plots (Figure 1(a)). The stem cell-related biological properties in OVs were investigated by using "stem cells" as the keyword and "C7 IMMUNESIGDB" as the filter condition in MSigDB to obtain 26 stem cell-related gene sets. 200 stem cell-related markers were identified in the GSE23321 dataset by comparative screening. These gene sets were intersected with DEGs (jlogFCj > 0:5 and Padj < 0:05) in tumor tissues and healthy tissues in the TCGA-OV dataset ( Figure 1(b)).

Construction of Diagnostic and Prognostic Gene
Signatures Using LASSO Regression Analysis. Due to the biological significance of stem cells in tumor growth, a diagnosis and prognosis model for predicting OS in OV patients was constructed based on the above 9 stem cellrelated hub genes. First, screening was performed using LASSO regression analysis to determine 6 genes with statistical significance based on the optimal lambda value. Thus, a 6-gene diagnostic model consisting of C19orf33, CBX2, CSMD1, INSRR, PRLR, and SLC38A4 was established by screening. This can play an important role in the process of diagnosing OV. The lambda and min values were computed using the logistic LASSO regression algorithm by specifying the loss function for LASSO to achieve a steady state (Figures 2(a) and 2(b)). Subsequently, the impact of these gene-based diagnostic models on the prognosis and OS of OV was visualized using a forest plot (Figures 2(c)-2(e)).  The PCA dimensionality reduction was applied to examine the differences in expression of 6 stem cell-related markers in OV (Figures 3(a) and 3(b)), indicating that the genetic models can help to identify OV patients to a certain extent. The differential expression of 6 stem cell-related markers in OV was significantly represented by box plots (Figure 3(c)). Subsequently, a heatmap was used to represent GSVA based on the immunological nonparametric and unsupervised gene sets that can estimate the scores for immune-related pathways or transcriptome signatures. A heatmap indicated that the stem cell-related marker models demonstrated considerable differences in enrichment between the low-risk group and high-risk group of OV patients (Figure 3(d)).
The expressions of six stem cell-related gene models were further analyzed. The correlations between expressions of different molecules were demonstrated by correlation network graphs and correlation heatmaps (Figures 4(a) and 4(b)). The results indicated a strong relationship between the six molecules, with INSRR, PRLR, and SLC38A4 as the most closely related. Subsequently, the histogram  14 Stem Cells International representing the differences in gene expression indicated that C19orf33 and CBX2 were highly expressed in OV tissues relative to normal tissues, while lower expression levels of CSMD1, INSRR, PRLR, and SLC38A4 were observed in OV tissues. In addition, the overall expression levels of C19orf33 and CBX2 were significantly higher than those of the remaining molecules (Figures 4(c)-4(h)). And we examined the expression levels of our selected gene markers using RT-PCR ( Figures 5(a)-5(f)). Therefore, it was speculated that among the stem cell-related OV diagnostic gene signatures, C19orf33 and CBX2 have higher potential applications.

Functional Enrichment of Stem Cell-Related Marker
Models. Subsequently, a functional enrichment analysis of 6 gene signatures significant for OV diagnosis and identifica-tion of poor OS prognosis was performed to understand their potential applications in human health (Figures 6(a)-6(h) and Table 1). The enrichment analysis using GO and KEGG was conducted on 6 gene models for OV identification and prognosis prediction with the terms and pathways having p < 0:05 considered significantly enriched.   [19,20]. At the same time, some studies have indicated that the immune microenvironment, such as T cell activation, can be regulated via JAK/STAT molecular pathway, thereby regulating stem cell growth [21,22].    Table 2). The independent analysis of the role of 6 stem cell-associated signatures on the prognosis of OV patients revealed that CBX2 (HR = 0:867, 95% CI (0.760-0.988)) may be a protective indicator for OV, while INSRR (HR = 0:988, 95% CI (1.267-3.748)) and SLC38A4 (HR = 1:399, 95% CI (1.020-1.920)) may be prognostic risk factors for OV patients (Figures 11(c) and 11(d) and Table 3). Figure 12 shows the differential expression between samples of different clinical subgroups, namely, lymphatic invasion (Figure 12(a)), tumor residual (Figure 12    Nomograms were developed for predicting the OS among OV patients based on the RMS library. The prediction performance of the nomogram is gauged by the consistency index of the calibrated graph and is assessed by comparing the probability of nomogram prediction to the probability of observed survival. For each patient, the scores for the corresponding variables were calculated and added. The predicted marginal positivity rate can be estimated from the total score of each patient. The nomograms were developed for predicting OV patient prognosis at 1, 3, and 5 years (Figure 13(d)). As shown in Figure 13(e), the calibration plot indicated an excellent degree of agreement between the nomogram predicted and observed results.

Stem Cells International
3.9. Analysis of Immune Cell Infiltration. To investigate the effect of diagnostic marker models on immune cell infiltration in the OV microenvironment, the composition of immune cells in mixed cells was estimated with the CIBERSORT algorithm, which deconvolutes the transcriptome expression matrix according to linear support vector regression and the principle of abundance. The samples with p < 0:05 were filtered out to derive an immune cell infiltration matrix [15]. The respective immune cell infiltration levels in the histograms were assessed using the CIBERSORT algorithm and ssGSEA, showing differences in the levels of various immune cell infiltrations between the low and high expression groups of C19orf33, CBX2, CSMD1, INSRR, PRLR, and SLC38A4 ( Figure 14). The results showed that all six stem cell-related markers were significantly associated with the distribution of different types of immune infiltrating cells, especially CBX2 and INSRR (Figures 14(b) and 14(d)).
Further, the correlation between stem cell-related markers and the degree of immune cell infiltration was analyzed to understand the role of genes in infiltrating the OV immune microenvironment, thereby leading to a poor prognosis (Figures 15(a)-15(f)). The results revealed a substantial negative relationship between stem cell-related markers and the infiltration levels of immune cells (Figures 16(a)-16(r) and Table 4). We can observe that the elevated expression of stem cell-associated genes correlates with a decrease in the infiltration of various types of immune cells, with T cell subtypes, macrophages, and dendritic cells showing a significant decrease, exhibiting a wide range of suppression from the innate to the adaptive immune system. In other words, immune cell infiltration in the tumor microenvironment is relatively reduced at high gene expression, indicating that insufficient immune cell infiltration results in rapid tumor progression and poor prognosis.

Discussion
In the past few years, several research studies have demonstrated that many proteins, dysregulated genes, and other molecular substances in OV may serve as crucial diagnostic and therapeutic targets. Since stem cells are closely tied to the pathological changes in OV, evaluating the clinical and biological significance of stem cell-related genes in OV may also lead to advancements in molecular diagnosis and antitumor therapy of OV. The study of stem cell-related genes and tumor development has gradually become a hot topic of research. They can promote tumor development by regulating immunosuppression, promoting vascular   A comprehensive study based on the TCGA-OV dataset was conducted for the identification and construction of key stem cell-related diagnostic and prognostic models for the development of multigene markers of OV. The key stem cell-related genes were identified by analyzing the genes differentially expressed in normal and OV tissues. The stem cell-related gene diagnostic signatures were established using machine learning, which can help in OV diagnosis and evaluate the potential biological significance of molecular models and their impact on the infiltration of immune cells in the OV tumor microenvironment. The variations in the immune infiltration and stem cell-related biological pathways were determined using ssGSEA and GSVA. They can help in the selection of molecular treatment options and provide a reference for elucidating the posttranscriptional regulatory mechanisms underlying OA. 26 related pathways were obtained by a comprehensive search of [7]

26
Stem Cells International of 9 differential stem cell-related genes were obtained by the intersection of the genes identified by MSigDB with the OVrelated DEGs analyzed by TCGA-OV and GTEx databases. Among these 9 differential stem cell-related genes, a gene model affecting the prognosis and survival of OV patients was developed using LASSO regression analysis based on the OS status and survival time of OV patients. Finally, a 6-gene prognostic model consisting of C19orf33, CBX2, CSMD1, INSRR, PRLR, and SLC38A4 was obtained. Based on the differential gene expression analysis, survival analysis, and multivariate and univariate Cox regression analyses in combination with clinical variables, it was verified that the 6-gene prognostic model can be a potential risk factor for poor prognosis of OV patients. The enrichment analysis using GO and KEGG indicated that the stem cell-related markers may influence the prognosis of OV as the JAK-STAT signaling pathway can play a driving role in affecting the ability of cancer stem cells. C19orf33 and its closely related markers have significant roles in multifocal and multicentric breast cancer (MMBC) and, therefore, can be used as markers for MMBC [6]. Previous studies have revealed a relationship between the immune microenvironment of gastric cancer and CBXrelated prognostic gene signatures. Expression levels of mRNA and protein of CBX2/3 were greatly increased in gastric cancer patients. However, the mRNA and protein expression levels of CBX6/7 were lower compared to CBX2/3 [23]. The binding of CBX2 to K27 trimethylated oligonucleosomes has prognostic significance for tumors [24], and CBX2 shapes chromatin accessibility promoting AML via p38 MAPK signaling [25]. CSMD1 is also strongly linked to the occurrence of various tumors and the immune micro-environment [26,27]. Previous studies have demonstrated that CSMD1 can restrict cancer progression by inhibiting esophageal squamous cell carcinoma proliferation, epithelial-mesenchymal transition, chemotherapy resistance, and inducing immunosuppression [28]. Several studies have shown that INSRR is an insulin receptor-related protein that can enhance the proliferation of several human breast cancer cell lines, possibly because of its synergistic effect with estrogen and insulin-like growth factor (IGF) [29,30]. The role of PRL and PRL receptors (PRLR) in tumor progression and tumorigenesis is well known [31,32], and several studies have manifested the regulatory involvement of PRLR in medication responsiveness and the prometastatic effect of PRL on breast cancer and other gynecological cancers [33,34]. In addition, studies have confirmed that SLC38A4 is a prospective biomarker with therapeutic goal that exerts tumor suppressive effects in hepatocellular carcinoma by modulating the Wnt/β-catenin/MYC/HMGCS2 axis [35]. It can be observed that the clinical significance of the stem cell-related six-gene risk model established in this study for cancer was verified by most of the experiments.
For further investigation of the independent prognostic factors dependent on the expression of stem cell-related marker models and affecting the OS of OV patients, the univariate and multivariate Cox regression analyses were conducted by combining the multiple clinical variables, indicating that CBX2 (HR = 0:867, 95% CI (0.760-0.988)) may be a protective factor for OV, while INSRR (HR = 2:179, 95% CI (1.267, 3.748)) and SLC38A4 (HR = 1:399, 95% CI (1.020, 1.920)) can be prognostic risk factors for OV. Subsequently, the differences in the gene expression among clinically distinct subgroups were

30
Stem Cells International analyzed. The results from this study were found to be consistent and indicated that C19orf33, CSMD1, INSRR, PRLR, and SLC38A4 may be potential risk factors for poor prognosis in OV patients, while CBX2 may be a potential protective factor for OV. C19orf33 and its closely related markers have important roles in MMBC and can be used as markers for MMBC [36]. The influence of the diagnostic marker model on the immune cell infiltration in the OV microenvironment was investigated by estimating the immune cell composition in mixed cells using the CIBERSORT algorithm based on the principle of linear support vector regression and deconvolution of the transcriptome expression matrix [37,38]. The samples with p < 0:05 were filtered out to derive an immune cell infiltration matrix. All the stem cell-related marker models were associated with the differences in the levels of OV immune infiltration cells. Further correlation analysis demonstrated that the stem cell-related markers were mostly negatively linked to the degree of immune cell infiltration, indicating that insufficient immune cell infiltration can lead to rapid tumor progression and poor prognosis.
Therefore, it is speculated that the six-gene risk model consisting of C19orf33, CBX2, CSMD1, INSRR, PRLR, and SLC38A4 is associated with the activation and migration of OV and is a potential biomarker and therapeutic target in OV.

Data Availability
All data generated or analyzed during this study are included in this article.