Comprehensive genomic analysis of microenvironment phenotypes in ovarian cancer

Background Ovarian cancer is one of the leading causes of cancer-related death in women. The incidence of ovarian cancer is insidious, and the recurrence rate is high. The survival rate of ovarian cancer has not significantly improved over the past decade. Recently, immune checkpoint inhibitors such as those targeting CTLA-4, PD-1, or PD-L1 have been used to treat ovarian cancer. Therefore, a full analysis of the immune biomarkers associated with this malignancy is necessary. Methods In this study, we used data from The Cancer Genome Atlas (TCGA) database to analyze the infiltration patterns of specific immune cell types in tumor samples. Data from the Gene Expression Omnibus (GEO) database was used for external validation. According to the invasion patterns of immune cells, we divided the ovarian cancer microenvironment into two clusters: A and B. These tumor microenvironment (TME) subtypes were associated with genomic and clinicopathological characteristics. Subsequently, a random forest classification model was established. Differential genomic features, functional enrichment, and DNA methylation were analyzed between the two clusters. The characteristics of immune cell infiltration and the expression of immune-related cytokines or markers were analyzed. Somatic mutation analysis was also performed between clusters A and B. Finally, multivariate Cox analysis was used to analyze independent prognostic factors. Results The ovarian cancer TME cluster A was characterized by less infiltration of immune cells and sparse distribution and low expression of immunomodulators. In contrast, cytotoxic T cells and immunosuppressive cells were significantly increased in the ovarian cancer TME cluster B. Additionally, immune-related cytokines or markers, including IFN-γ and TNF-β, were also expressed in large quantities. In total, 35 differentially methylated and expressed genes (DMEGs) were identified. Functional enrichment analyses revealed that the DMEGs in cluster B participated in important biological processes and immune-related pathways. The mutation load in cluster B was insignificantly higher than that of cluster A (p = 0.076). Multivariate Cox analysis showed that TME was an independent prognostic factor for ovarian cancer (hazard ratio: 1.33, 95% confidence interval: 1.01–1.75, p = 0.041). Conclusion This study described and classified basic information about the immune invasion pattern of ovarian cancer and integrated biomarkers related to different immunophenotypes to reveal interactions between ovarian cancer and the immune system.


INTRODUCTION
Epithelial ovarian cancer is the main pathological type of ovarian cancer, and it has some histological subtypes, such as such as ovarian clear cell carcinoma, mucinous ovarian cancer and serous ovarian cancer (Lowe et al., 2013). Global results showed a wide range in ovarian cancer incidence rates, and ovarian cancer survival has shown modest improvement from a statistical perspective in the U.S ( Barnholtz-Sloan et al., 2003;Lowe et al., 2013;Park, Ruterbusch & Cote, 2017). The efficacy of immunotherapy can be further improved in ovarian cancer by analyzing the immune indexes of these patients to predict their responses to immune checkpoint blockade. The tumor microenvironment (TME) consists of a variety of innate and adaptive immune cells and a series of cytokines that regulate their responses to tumor cells. Solid tumors interact with infiltrating immune cells and inflammatory molecules in a dynamic equilibrium that affects disease development and outcome. Therefore, recently, several researches have investigated the functions of immune cells and cytokines in the TME in tumor development and/or tumor immune escape. These studies have also reported new methods of evaluating the infiltration of immune cells or factors in the TME (Lorenzo-Sanz & Munoz, 2019). These studies have proposed several immune signals that control the immune infiltration pattern of the TME, which can be used to accurately screen the immune responses to gastric cancer, non-small cell lung carcinoma (NSCLC) and other malignant tumors, and are potential therapeutic candidates (Bi et al., 2020).
In this study, we used the gene expression profile data of 782 ovarian cancer patients, including all histological subtypes, from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. According to the immune cell invasion and cytokine expression profiles of the TME, patients were divided into two different TME clusters. We found that the survival rate of patients in the low infiltration group (TME cluster A) was higher than that of the high infiltration group (TME cluster B). We also integrated biomarkers related to different immunophenotypes and analyzed genes with differential methylation levels, which provided evidence to better understand the interaction between ovarian cancer and the immune system. Our study aimed to develop an immune pattern and identify the immune signature of ovarian cancer.

Ovarian cancer expression data sets
We downloaded the level-4 mRNA sequencing data of ovarian cancer patients (presented as FPKM) and the related clinical data from TCGA database; these data were used as the discovery cohort. Patients were removed if their clinical and/or survival information were vague or unavailable. Additionally, we collected and enrolled the ovarian cancer patients' microarray data from several independent datasets in the GEO database, including GSE3149, GSE26712, and GSE63885. These gene expression data were all generated and annotated by platform Affymetrix Human Genome U133 Plus 2.0 (GPL570) or Affymetrix Human Genome U133a (GPL96). The COMBAT experience empirical Bayes approach in the sva package was used to eliminate the heterogeneous batch effect among different studies (Johnson, Li & Rabinovic, 2007). The background was adjusted and the data were normalized using the limma package (Ritchie et al., 2015).

Estimation and classification of TME cell abundances
We comprehensively analyzed and identified the published literature and ultimately utilized the gene signatures to build a compendium of microenvironment-associated genes (Bindea et al., 2013). The gene signature included 585 genes and represented 24 microenvironment cell subtypes from both innate and adaptive immunity, namely T cells, cytotoxic cells, CD8 T cells, as well as Tγ δ, T helper, Tcm, Tem, Th1, Th2, Th17, Tfh, Tgd, and Treg cells, B cells, eosinophils, macrophages, mast cells, neutrophils, dendritic cells (DCs), immature DCs, activated DCs (aDCs), natural killer (NK) cells, NK CD56-dim cells, and NK CD56-bright cells. The relative enrichment level of each immune cell subset was estimated based on the RNA expression data of each ovarian cancer patients using the Gene Set Variation Analysis (GSVA) algorithm in the GSVA package (Hänzelmann, Castelo & Guinney, 2013), which is a non-parametric and unsupervised analysis method that is primarily used to evaluate the gene set enrichment of nuclear transcriptome microarray data. It has been reported that GSVA outperforms single cell Gene Set Enrichment Analysis (ssGSEA) when calculating the signal-to-noise ratio in differential gene expression and differential signaling pathways because GSVA included normalization of gene expression to reduce the noise of the data (Tamborero et al., 2018). Finally, we obtained enrichment scores ranging from −1 to 1 for those 24 immune cell types, which indicated the relative enrichment of the immune cell infiltration.
An unsupervised clustering approach on the basis of Euclidean distance and Ward's linkage was used to cluster the ovarian cancers with distinct immune cell infiltration patterns. We use the ConsensusClusterPlus package containing 1,000 repeats to determine the optimal number of clusters for the ''Infiltrated group'' based on the percentage of data variance.
We determined the optimal clustering number of the ''Infiltration group'' on the basis of the percentage of variance of the data using the ConsensusClusterPlus package with 1,000 repeats (Monti et al., 2003). The distribution patterns of the 24 immune cell subtypes in each patient were produced by utilizing the pheatmap package.

Differential genomic features and functional enrichment analysis
The limma package was adopted to screen differentially expressed genes (DEGs) and miRNAs in different infiltration groups. The limma software package used Benjamin and Hochberg (BH) methods to estimate the gene expression profile through the moderation t -test. The p-value was adjusted to the false discovery rate (Ritchie et al., 2015). |Log (fold change) (logFC) >0.5 with adjusted p < 0.05 were considered the cut-off criteria for differential expression. The cluster analysis software package was adopted to analyze the functional enrichment of DEGs (Yu et al., 2012). Both Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) terms were identified strictly at p < 0.05. Additionally, the maftools software package based on the Kruskal-Wallis test was used to measure the mutation distribution of somatic cells and copy number variation detected by gistic. An adjusted p-value < 0.01 was regarded to be significant.

Analysis of DNA methylation data
We employed the TCGA DNA methylation obtained by the Illumina Human Methylation450 BeadChip array, which contains 485577 probes (approximately 396066 after filtering invalid probes) and covers 99% of RefSeq genes. The methylation level of the probe was quantified as β-values, which were defined as the ratio of the strength of the methylated and unmethylated alleles. The Champ pipeline was used to normalize 5 -c-phosphate-g-3 (CPG) methylation data of different infiltration groups, and the results were compared with those of DEG analysis. The threshold value for CpG sites was set as a corrected p-value of <0.05 and an absolute β-value of >0.2. In this study, when the number of differentially methylated CPGs in the promoter region of a gene was >1, it was considered a differentially methylated gene. We focused on the methylation of gene promoter regions, including TSS200, TSS1500, T -UTR, and 1st-Exon, since differentially methylated genes (DMGs) are commonly defined according to their promoter methylation status (Lee et al., 2004;Licchesi et al., 2008;Liu et al., 2008). According to both DMGs and DEGs, genes were classified into four categories: hypermethylated-upregulated (hyper-up), hypermethylated-downregulated (hyper-down), hypomethylated-upregulated (hypo-up), and hypomethylated-downregulated (hypo-down).

Establishing of the random forest classification model
Establishing the random forest classification model depended on the random forest algorithm and the random forest packets of thousands of classification trees or regression trees. First, we randomly divided the ovarian cancer patients in TCGA into training and experimental cohort at a 7:3 ratio. The number of variables (mtry) and the number of trees to grow (ntrees) were randomly allocated. The optimal value for the training group to achieve the lowest error rate was determined after 10,000 iterations. According to the above parameters, a random forest classification model was constructed. Additionally, the variable importance (VIMP) value was set to measure the prediction error rate of the model.

Statistical analysis
All statistical data and images were analyzed and plotted by R version 3.5.3 (R Foundation for Statistical Computing, Vienna, Austria) (Colaprico et al., 2016). Continuous variables between the high-and low-invasiveness groups were compared by Student's t -test and the Wilcoxon test. Categorical variables were analyzed by the chi-square test. Kaplan-Meier analysis and log-rank tests were performed using the ggplot2 package to analyze overall survival (OS). To estimate the prognostic value of invasion by different immune cell subtypes, the survminer software package was used to identify the cutoff value based on the correlation between OS and GSVA scores. Finally, the forestplot package was used to present the results. Two-sided p-value of < 0.05 were regarded as statistically significant in the chi-square test, correlation test, log-rank test, and Cox analysis.

Phenotypic landscape of the ovarian cancer microenvironment
The flow chart of this study is shown in Fig. 1A. There were 343 patients from TCGA enrolled in the discovery cohort and 439 patients from the GEO database enrolled in the validation cohort. We used a GSVA enrichment fraction containing multiple immune cell types to describe the relative abundance and comprehensively describe the immune invasion phenotype of each ovarian cancer patient. K-means clustering divided the 343 patients into TME cluster A (n = 170) and TME cluster B (n = 173) according to the dichotomy of the discovery cohort (TCGA data). The different groups reviewed the different immune infiltration patterns of each ovarian cancer patient's adaptive and innate immune system (Fig. 1B). TME cluster A was featured with less abundance of immune cells and a sparse distribution and low expression of immunomodulators. In contrast, the activation levels of innate and/or adaptive immune responses in TME cluster B were relatively high, indicating high infiltration of T cells, B cells, Tregs, TGD cells, cytotoxic cells, DCs, IDCs, ADCs, and macrophages (Figs. 1B and 1C).
Next, we explored the distribution of age and tumor stage between TME clusters A and B (Fig. 1F and Table 1). The results showed that the level of immune infiltration in elderly patients was lower than that of young patients (p = 0.046). No significant difference was observed regarding the distribution different tumor stages among the two TME clusters (p = 0.994).

Establishing and verification of the random forest classification model
To establish a practical clinical method, we divided ovarian cancer patients into TME cluster A and TME cluster B, and then established a forest classification model according to the Breiman random forest algorithm. As described below, 343 patients from the TCGA database were randomly allocated into a training group (n = 240) and an experimental group (n = 103). The relative abundance (GSVA score) of the 24 immune cell types was used as the input variable, and TME cluster determined by unsupervised clustering was used as response factor. In the training queue, after 10,000 iterations, the optimal mtry and ntrees were 4 and 4000, respectively. To test the discriminatory ability of the model, we applied it to the training queue and internal test queue to verify the effectiveness of the model. The prediction accuracy was 100% and 99%, respectively, which proved the reliability and stability of the model. We then used these models in the external validation cohort from the GEO database, which included 493 ovarian cancer patients (GSE3149, GSE26712, and GSE63885). After GSVA scores of 2, the four immune cell types were calculated and fitted with the model, the patients were divided into TME cluster A and B for external validation. The heatmap ( Fig. 2A) and box plot (Fig. 2B) revealed that TME cluster B from the GEO database was also enriched for cells of the innate and adaptive immune systems, which was consistent with analysis of TCGA data. These results further confirmed the clinical and immunological significance of the random typing model.

Identification and functional annotation of DEGs
To determine the biological differences between TME clusters A and B from TCGA database, we identified DEGs. Compared with TME cluster A, 494 upregulated genes and 73 downregulated genes were found in TME cluster B (Fig. 3A). The enrichment of 567 DEGs by GO and KEGG were analyzed with R cluster profiler package. DEGs from TME cluster B were observed to be enriched in immune-related pathways, mainly including the antigen presentation pathway, IFN-γ signaling, leukocyte activation, cytokine receptor interaction, allogeneic rejection, and Th17 cell differentiation regulation, supporting the abundant immune infiltration pattern in TME cluster B (Figs. 3B and 3C).

Identification and functional annotation of DMGs
We performed comparative analyses to identify DMGs in ovarian cancer between TME cluster A and B using TCGA dataset. Overall, we identified 242 DMGs with 194 hypermethylated (methylogFC >0.1; adjusted p < 0.05) and 48 hypomethylated DMGs (methylogFC <−0.1; adjusted p < 0.05). We integrated the DMGs and DEGs to identify 35 differentially methylated and expressed genes (DMEGs). Furthermore, we divided the DMEGs into four different groups, namely hypo-up (n = 17), hyper-up (n = 12), hyper-down (n = 5), and hypo-down (n = 1) (Fig. 3D). We focused on the DMEGs in the hyper-down and hypo-up groups in the subsequent analyses. LAPTM5 was the most upregulated gene (logFC = 1.78, adjusted p < 0.001) and DLK1 was the most downregulated gene (logFC = −0.09, adjusted p < 0.001). Functional enrichment analyses showed that the DMGs in TME cluster B participated in important biological processes and immune related pathways, including regulation of antigen presentation, regulation of B cell receptor signaling, cytokine-cytokine receptor interaction, and Th17 cell differentiation, indicating more immune cell infiltration in TME cluster B (Figs. 3E and 3F).

Characteristics of the immune microenvironments of the TME clusters
To further investigate the different genetic and molecular characteristics of TME clusters A and B, we analyzed the expression profiles of several immune-related genes and cytokines in the 343 ovarian cancer samples. First, we used a seven-gene panel derived from the Bopler trial as an alternative indicator to quantify the levels of CD8A, CXCL10, IFNγ , GZMA, GZMB, EOMES, PRF1, and TBX21 (Fehrenbacher et al., 2016). These genes have previously been connected with activated T cells, immune cytolytic activity, and interferon-γ expression (Fehrenbacher et al., 2016;Rooney et al., 2015;Wang, Windgassen & Papoutsakis, 2008). Second, Rooney et al. (2015) defined cytolytic activity scores, namely the geometric mean of PRF1 and GZMA, to indicate the intensity of antitumor response. The parameters, including CD8A, CXCL10, EOMES, GZMA, GZMB, IFN-γ , TBX21, and PRF1, showed significantly higher levels in TME cluster B than TME cluster A (all p < 0.001), indicating that patients with higher infiltration had enhanced cytotoxic function compared with patients who showed low infiltration (Fig. 4A). The comparison of the two phenotypes (such as MyD88, TICAM1, and TLR3) with molecules potentially involved in innate immune priming showed a similar trend (except TLR9, all p < 0.05; Fig. 4B, left). Additionally, compared with TME cluster A, TME cluster B showed abundant expression of MHC-I/II-related antigen presenting molecules ( p < 0.001; Fig. 4B, right). The TME cluster B group showed infiltration of abundant active innate and adaptive immune cells, as well as immunosuppressive cells, such as Tregs and iDCs. Therefore, we investigated the effect of immune activation and immune inhibition according to the CD8 + T cell/Treg ratio (Fig. 4C). This showed that TME cluster B had a significantly lower ratio than TME cluster A, which may be caused by a feedback mechanism initiated by recruiting effector T cells into the TME. In order to confirm this speculation, we investigated the expression patterns of 15 immune checkpoint molecules (Fig. 4B, left three) and 20 costimulatory molecules (Fig. 4B, right). The thermogram revealed some relatively high expressed costimulatory molecules (most p < 0.001, such as CTLA4, PDCD1,LAG3,and CD274,Figs. 4D and 4E) in TME cluster B group, suggesting that these patients may benefit from immune checkpoint inhibitors.

Somatic mutations in ovarian cancer
We also investigated whether somatic mutations were associated with invasiveness independent of tumor origin. In our analyses, the mutation load in TME cluster B  was insignificantly higher than that in TME cluster A (p = 0.076; Fig. 5A). Generally, more mutations are associated with increased heterogeneity, indicating stronger immune rejection and more prominent immune cell infiltration. We also found that patients in TME cluster B were more susceptible to PIK3CA mutation than those in TME cluster A (61% vs. 46%; p = 0.031; Fig. 5B). These results suggested that PIK3CA mutation may be a major driver of dysregulated expression of invasiveness-related genes in human cancer.

Prognostic value of the TME clusters
Univariate and multivariate analyses were also performed to evaluate the relationship between several clinical factors and prognosis in the two TME clusters (

DISCUSSION
Recently, immunotherapy has aroused great interest following improvements in our understanding of the molecular basis of immune recognition and regulation of tumor cells.
The infiltration and activation of immune cells are believed to be closely related to intrinsic tumor processes and prognosis. In ovarian cancer, it has been found that the presence of lymphocyte infiltration is associated with improved prognosis (Sato et al., 2005). In contrast, the survival rate of ovarian cancer patients who lack lymphocyte infiltration is reduced (Hwang et al., 2012). To better understand the distribution of immune cells and factors in the ovarian cancer microenvironment and find new potential targets for immunotherapy, we divided cases into a low infiltration group (TME cluster A) and an infiltration group (TME cluster B) according to the individual immune infiltration patterns. The distribution of immune cells and immune related factors were statistically analyzed. This analysis revealed that TME infiltration pattern was an independent prognostic factor for ovarian cancer patients. DNA methylation is a major epigenetic mechanism, which plays important roles in various biological processes (Bock, 2012), including regulating gene expression (Kulis & Esteller, 2010), cell differentiation (Smith & Meissner, 2013), and inflammation (Horsburgh et al., 2015). Aberrant methylation was considered to be involved in various diseases, including ovarian cancer (Seeber & Van Diest, 2012). In this study, we identified 35 DMEGs and found that LAPTM5 was the most upregulated gene, while Dlk1 was the most downregulated gene. LAPTM5 has been reported to show a negative correlation between 5 -UTR methylation and homologous gene expression, indicating that it is at least partially regulated by DNA methylation (Cortese et al., 2008). DLK1 encodes a transmembrane protein with multiple EGF repeats that plays a role in regulating cell growth. Dlk1 overexpression participated in ovarian carcinogenesis through Notch activation and EMT induction (Huang et al., 2019), which may depend on its hypermethylated level. However, no study has focused on the relationship between Dlk1 methylation and ovarian cancer. Inhibition of RUNX1 promotes cisplatin-induced apoptosis in ovarian cancer cells (Xiao et al., 2020). MMP11 was found to be related with bowel metastases in ovarian cancer (Mariani et al., 2019). However, the methylation level of the above genes associated with their functions in ovarian cancer are still unclear.
This study had several limitations. First, we included only a small number of ovarian cancer cases. Second, we failed to use non-database cases for external verification. More clinical data are needed to verify the reliability of this classification standard. Further exploration in this area is necessary. Third, we enrolled the ovarian cancer patients with all histological subtypes because the sample size of any specific histological subtype was insufficient. In the further study, we may enroll sufficient sample size of specific histological subtypes, such as ovarian clear cell carcinoma, mucinous ovarian cancer or serous ovarian cancer, to make in-depth exploration and study.