A methylomics-associated nomogram predicts the overall survival risk of stage III to IV ovarian cancer

Accumulating studies demonstrated that DNA methylation may be potential prognostic hallmarks of various cancers. However, few studies have focused on the power of DNA methylation for prognostic prediction in patients with stage III to IV ovarian cancer (OC). Therefore, constructing a methylomics-related indicator to predict overall survival (OS) of stage III to IV OC was urgently required. A total of 520 OC patients with 485,577 DNA methylation sites from TCGA database were selected to develop a robust DNA methylation signature. The 520 patients were clustered into a training group (70%, n = 364 samples) and an internal validation group (30%, n = 156). The training group was used for digging a prognostic predictor based on univariate Cox proportional hazard analysis, least absolute shrinkage and selection operator (LASSO) as well as multivariate Cox regression analysis. The internal and external validation group (ICGC OV-AU project) were used for validating the predictive robustness of the predictor based on receiver operating characteristic (ROC) analysis and Kaplan–Meier survival analysis. We identified a 21-DNA methylation signature-based classifier for stage III-IV OC patients’ OS. According to ROC analysis in the internal validation, external validation and entire TCGA set, we proved the high power of the 21-DNA methylation signature for predicting OS (area under the curve [AUC] at 1, 3, 5 years in internal validation set (0.782, 0.739, 0.777, respectively), external validation set (0.828, 0.760, 0.741, respectively), entire TCGA set (0.741, 0.748, 0.781, respectively). Besides, a nomogram was developed via methylation risk score as well as a few clinical variables, and the result showed a high ability of the predictive nomogram. In summary, we used integrated bioinformatics approaches to successfully identified a DNA methylation-associated nomogram, which can predict effectively the OS of patients with stage III to IV OC.


Introduction
Ovarian cancer (OC) was most common malignancy of all gynecologic cancers in the United States, which has the fifth highest tumor-related mortality in women. [1] In spite of improvement in the therapy of advanced OC, the overall 5-year survival rate remains less than 50%. [2] Most patients are diagnosed at an advanced stage which precludes curative therapy. [3] The staging system determined by the International Federation of Gynecology and Obstetrics is commonly employed to identify prognosis and direct an optimized therapeutic schedule. [4] Whereas, International Federation of Gynecology and Obstetrics stage are considered as significant clinical prognostic factors of OC but are insufficient for the prediction of survival time. Thus, it is a crucial to identify diagnostic classifiers which can reliably stratify OC patients for individualized treatment.
Specific molecular biomarkers have been proved to be involved in prognosis of OC. For example, Zheng et al identified a molecular marker associated with OC prognosis using bioinformatics analysis and experiments. [5] Zannoni et al showed that M-CAM expression served as a marker of poor prognosis in epithelial OC. [6] Baekelandt et al revealed that p-glycoprotein expression was a marker for chemotherapy resistance and prognosis in advanced OC. [7] In the investigation for reliable potential hallmarks for prognosis of various cancer, DNA methylation has been proved to be potential prognostic predictor. For example, Fiano et al revealed that DNA methylation in repeat negative prostate biopsies served as a marker of missed prostate cancer. [8] Jouinot et al indicated that DNA Methylation was an independent prognosis biomarker of survival in adrenocortical carcinoma. [9] Schmitz et al showed the capacity of a DNA methylation marker panel via liquid-based Medicine cervical scrapes to test cervical carcinoma and its precancerous stages. [10] Zhang et al suggested that DNA methylation mediated silencing of microRNA-874 was a promising diagnosis and prognostic hallmark in breast cancer. [11] DNA Methylation was revealed to be a reversible biological signal which may be potential therapeutic targets. [12] Therefore, the comprehensive analysis of DNA methylation is promising in developing reliable prognostic predictors for individualized therapy and improving patients' survival time. However, few studies have focused on the power of DNA methylation for prognostic prediction in patients with stage III to IV OC. Building a methylomics-related indicator to predict overall survival (OS) of stage III to IV OC seems very promising.
In this study, we identify a DNA Methylation signature via microarray profiling related to OS in stage III to IV OC. The power of the DNA Methylation signature was tested by Kaplan-Meier analysis, Relative operating characteristic curve (ROC) analysis and the result suggested that the hallmark has potential to improve the care of women with stage III to IV OC. In addition, the result also showed that our nomogram had a high prognostic predicted ability.

DNA methylation information of stage III to IV OC patients
The stage III to IV OC DNA methylation information from The Cancer Genome Atlas (TCGA) database that was projected to illumina Human Methylation 450 BeadChip (illumina Inc, CA) and eligible clinical information was achieved with R TCGAbiolinks package. [13] OV-AU project was manually downloaded from International Cancer Genome Consortium (ICGC) database. [14] The DNA methylation levels were defined as β values, computed as M/(M + U + 100), with U representing an unmethylated signal and M representing a methylated signal. Patients were excluded from further analysis if lacking of survival data. The present study comprised of a total of 520 stage III-IV OC patients with 485,577 DNA methylation sites. The 520 patients were clustered into a training group (70%, n = 364 samples) for digging a prognostic predictor and an internal validation group (30%, n = 156) for validating the predictive robustness of the predictor. What's more, the 93 stage III to IV OC patients of OV-AU project from ICGC database were used as an external validation dataset. Least absolute shrinkage and selection operator (LASSO) analysis was adopted to screen the key methylation sites for exploring predictor of stage III to IV OC patients' OS. In addition, LASSO analysis was carried out using 1000 iterations via a publicly available R package "glmnet." [14] Our study didn't involve human beings or animals, so the approval of the Ethics Committee is not necessary for our study. The Institutional Review Board of Taikang Tongji (Wuhan) Hospital approved the study.

Data processing, normalization and determination of differentially expressed methylation sites
Pre-processing of the raw data was completed for digging a prognostic biomarker of stage III to IV OC. A DNA methylation site would be excluded if its value was no available in any sample. Next, the data was normalized via "betaqn" function from wateRmelon package. [15] Then, the total patients were clustered into recurrence cohort and no recurrence cohort via recurrence status. The standardized beta was transformed to M value in accordance to the formulation: M = log (β/(1-β)). M value was used for eliminating the variance arising from various probes. After that, M value was executed for digging the differentially expressed methylation sites between recurrence and no recurrence group using "dmpFinder" function of minfi package. [16]

Generation of methylomics-related signature
We executed univariate Cox proportional hazard analysis to dig the methylation sites importantly (P < .05) linked to stage III to IV OC patients' OS as potential factors. Next, the LASSO Cox regression analysis was implemented across the potential factors for further digging the candidate sites involved in stage III to IV OC patients' OS. After that, the total candidate sites were mapped to the multivariate Cox regression analysis for unearthing the methylome-related classifier for stage III to IV OC patients' OS. Finally, the combination of a total 21 DNA methylation sites was used to construct the DNA methylation-based prognostic classifier. Then a risk-score formula was generated based on the 21-DNA methylation signature to compute OS risk score of every sample. Patients were assigned to the high-risk cluster if their prognostic risk scores were larger than the cutoff of the median risk score, whereas lowrisk cohort consisted of samples with the risk scores that were less than cutoff value. We performed ROC analysis to evaluate the value of the 21-DNA methylation-based signature. Area under the curve (AUC) value achieved from ROC analysis was employed to weigh the predicted power of the methylome-related classifier for stage III-IV OC patients' OS by the "sur-vivalROC" package. [17] Kaplan-Meier survival was applied to compare differences in OS between high-and low-risk clusters and Kaplan-Meier curves were achieved through the "survival" package. [18]

Gene set variation analysis
To dig the 21-DNA methylation signature-based signaling pathways. We executed single sample gene sets enrichment analysis in accordance to TCGA OC mRNA dataset via gene set variation analysis package. [19] The most crucial pathways positively involved in DNA methylation risk score were assessed. Patients were assigned to the high-risk group if their prognostic risk scores were more than the cutoff of the median risk score, while low-risk cohort comprised of samples with the risk scores that were less than cutoff value. Significance was set as P < .05.

Construction of the nomogram
To improve the prognostic discrimination of the 21-DNA methylation signature for stage III to IV OC, a nomogram was established across the "rms" R package. The univariate and multivariate Cox proportional hazard analysis were carried out through the methylation risk score and other clinicopathological factors. Cox proportional hazard models was implemented to compute hazard ratios (HR) as well as corresponding 95% confidence interval (CI). Factors that were significant (P ≤ .05) from the multivariate Cox proportional hazard analysis were adopted for the construction of the nomogram across the "rms" R package. C-index, ROC and calibration plot and decision curve analysis were used as indicators to detect the prognostic performance of our nomogram. The outcome of the nomogram was exhibited in the calibrate curve, and the 45° line implied the best prediction.

Clinical features of the study populations
Totally, 520 TCGA patients and 93 ICGC database patients who were clinically and pathologically diagnosed as stage III to IV OC were enrolled in present study. The clinical feature of stage III to IV OC patients in TCGA dataset and ICGC dataset was exhibited in Table 1. The experimental procedures were shown in Figure 1.

Interplays between 21 DNA methylation signature and stage III-IV OC patients' OS in the internal validation, external validation and entire TCGA set
Samples were separated into the low-versus high-risk cohort in accordance to the 21-DNA methylation-related classifier. Kaplan-Meier survival analysis was employed to distinguish the difference in OS between the 2 cohorts. The patients in high-risk cohort had a significantly poor OS in internal validation set (P = 7e-07) (Fig. 4A), similar outcomes were shown in external validation set (P = 4e-05) (Fig. 4C) and entire TCGA set (P = 0e + 00) (Fig. 4E). These results suggested that our signature can effectively distinguish patients with good and poor prognosis.

Exploration of the predictive capacity of the 21-DNA methylation signature with ROC analysis
The power of the 21-DNA methylation-related classifier for stage III-IV OC patients' OS was examined with a time-dependent ROC curve. The AUC of the 21-DNA methylation-related classifier at 1, 3, 5 years in internal validation set were 0.782, 0.739, 0.777, respectively (Fig. 4B). A good ability was also exhibited in external validation set (0.828, 0.760, 0.741) (Fig. 4D) and entire TCGA set (0.741, 0.748, 0.781) (Fig. 4F), exhibiting that 21-DNA methylation-related biomarker had a great power for predicting OS of OC patients. Next, samples were ranked on the strength of their risk scores (Fig. 5A), the dotplot was implemented in the light of recurrence status (Fig. 5B). We discovered that the high-risk cohort created a worse OS than that in the low-risk cluster. Heatmap of 21 methylation sites distribution on the basis of risk score was presented in Figure 5C, which generated a similar performance to our previous boxplot (see Fig. S2, http://links.lww.com/MD/ I375, Supplemental Digital Content, which shows methylation risk score analysis of 93 stage III-IV OC patients in OV-AU project; see Table S2, http://links.lww.com/MD/I378, Supplemental Digital Content, which show 21 DNA methylation signature-associated biological pathways).
Finally, subgroup analysis was performed on the strength of several clinical covariates which consisted of age, stage, tumor residual and cancer status. A great ability of the 21-DNA methylation-related classifier was shown in most sub-group (see Figs.  S3-S6, http://links.lww.com/MD/I376, Supplemental Digital Content, which shows Kaplan-Meier and ROC analysis of patients with stage III-IV OC in sub-groups according to age, stage, tumor residual and cancer status, respectively.). These findings demonstrated that our DNA methylation signature yield a high predictive power.

Identification of the 21 DNA methylation classifierrelated biological pathways
The median score was employed as the cutoff to stratify samples into low-and high-risk group. Top 20 pathways which were more triggered in the high-risk cases than that in low-risk cases Figure 3. Boxplots of methylation β values against risk group in the entire TCGA dataset. "High Risk" and "Low Risk" refer to the high-risk and low-risk groups, respectively. The median risk score was applied as a cutoff. Y-axis stands for the β-value of 21-DNA methylation sites respectively. The differences between the 2 groups were weighed by Mann-Whitney U test. TCGA = the cancer genome atlas.
were presented in Figure 6A. The same trend was further proved between enriched pathways and DNA methylation risk score (Fig. 6B).

Nomogram construction
To explore whether the 21-DNA methylation-related signature was an independent classifier for stage III-IV OC patients' OS, univariate and multivariate Cox tool was executed on the strength of methylation-related risk score and a few clinical covariates. HRs showed that the 21-DNA methylation-related classifier was tightly related to stage III-IV OC patients' OS (P < .001, HR 2.15, 95% CI 2.15-2.50) ( Table 2) (Fig. 7), manifesting that the 21-DNA methylation-related signature functioned as an independent prognosis classifier. To predict stage III to IV OC patients' OS on the basis of a quantitative strategy, we developed a nomogram (Fig. 8) according to risk score cancer status, as well as age. The significance between the 21-DNA methylation-based factor and the conventional clinical covariates was exhibited in (Fig. 9A). The power of the nomogram was tested on the strength of C-index (0.818, 95%CI: 0.778-0.864), AUC (1, 3, 5-year: 0.749, 0.779, 0.832, respectively) (Fig. 9B) and calibration plot (Fig. 9C-E), manifesting a strong power of the model. Besides, decision curve analysis proved that the nomogram generated more important clinical utilization for prognostic prediction of stage III to IV OC than that in treat all or treat none cluster. Net benefit was achieved for stage III to IV OC patients in 3-year recurrent risks (Fig. 9F). The result proved that our methylomics-based nomogram generated a great capacity and may have potential for clinical utilization.

Discussion
In this study, TCGA databases and ICGC database were used to explore the DNA methylation hallmark for OS in stage III to IV OC. finally, we identified a signature which contained 21 DNA methylation sites by combining differential methylation analysis, Cox regression analysis, ROC analysis, and Kaplan-Meier analysis. The above 21 DNA methylation sites were corresponding to 24 genes (ME2, TF, DNASE2, TFPI2, C20orf117, ADCY2, DYRK1B, EHD2, SFPQ, MCM7, AP4M1, C1orf131, GNPAT, AKTIP, UTP14C, ALG11, ZNF584, FAM76A, MYNN, H19, NPPA, SOD2, ZNF652, MPP1). Interestingly, previous studies have shown that most of these 24 genes were associated with cancer, respectively. For instance, Rita et al found that combination of 2-methoxyestradiol (2-ME2) and eugenol were involved in androgen independent prostate cancer cells. [20] Xu et al identified cancer subtypes from miRNA-TF-mRNA regulatory networks and expression Data. [21] Marek et al indicated that Human OC log-rank test was implemented to assess the differences in OS between the low-risk and high-risk cluster stage III to IV OC patients. (B, D, F) 1-, 3-, 5-year ROC curves of the 21-DNA methylation-based signature were adopted to assess the value of predicting the OS of stage III to IV OC patients. "High" and "Low" stood for the high risk score group and low risk score group, respectively. The median risk score was set as a cutoff. OC = ovarian cancer, OS = overall survival, ROC = relative operating characteristic curve, TCGA = the cancer genome atlas. cells may be eradicated by transgenic expression of recombinant DNASE1, DNASE1L3, DNASE2, and DFFB controlled by EGFR promoter. [22] Dong et al suggested that hypermethylation of TFPI2 was correlated with cervical cancer incidence in the Uygur and Han populations of Xinjiang, China. [23] Yu et al revealed that SPARCL1, SHP2, MSH2, e-cadherin, p53, ADCY-2 and MAPK were prognosis-associated in colorectal cancer. [24] Chen et al reported that DYRK1B overexpression was associated with breast cancer growth and a poor prognosis. [25] Yang et al demonstrated that EHD2 played an important role in migration and invasion of human breast cancer cells. [26] NEAT1_2-SFPQ axis mediated cisplatin resistance in liver cancer cells in vitro. [27] MCM7 promoted cancer progression via cyclin D1-dependent signaling and served as a prognostic hallmark for patients with hepatocellular cancer. [28] Stabilization of FASN by ACAT1-mediated GNPAT acetylation promoted lipid metabolism and hepatocarcinogenesis. [29] It has recently been suggested that UTP14c are expressed in 80% of OCs. [30] Khakpour et al suggested that ZNF584 was involved in breast cancer. [31] MYNN and TERC gene polymorphisms were associated with bladder cancer in a Turkish population. [32] LncRNA H19 promoted lung cancer proliferation and metastasis via inhibiting miR-200a function. [33] Strong SOD2 expression and HPV-16/18 positivity were independent events in cervical cancer. [34] Co-expression of the androgen receptor and the transcription factor ZNF652 was related to prostate cancer outcome. [35] The result exhibited that the 18 of the 24 genes related to the 21 sites played key roles in development of cancer. We speculated that the above 18 of the 24 genes may be involved in the prognosis of stage III to IV OC patients.
Accumulating studies performed nomograms to elevate prognostic robustness for clinical result via incorporating a few independent clinical factors into a single quantitative risk probability. For instance, Amita et al revealed preoperative nomograms combining magnetic resonance imaging and spectroscopy for predicting insignificant prostate carcinoma. [36] Lee et al showed a prognosis nomogram to predict OS in patients with platinum-sensitive recurrent OC. [37] Our study was the first to execute the transformative utility combining clinical factors and methylation predictor for application beyond simple classification into field of individualized stage III to IV OC patients' OS. The outcome implied great capacity of our nomogram for     predicting stage III to IV OC patients' OS in the clinical routine, which made our outcome more reliable. The LASSO is on the basis of shrinkage estimation and has been frequently applied in the statistical field. For example, a previous study reported the application of ayesian LASSO for genomic selection in French holstein and montbéliarde breeds. [38] A research revealed parsimonious and robust multivariate calibration via rational function of LASSO and Rational Function Elastic Net. [39] The LASSO whose mean squared error is smaller than that in conventional approaches can overcome the multicollinearity issue and implement the function of overall variable selection and coefficients shrink. [40][41][42] In our study, LASSO Cox regression model was employed to explore the candidate DNA methylation sites associated with stage III to IV OC patients' OS that excluded the factors between univariate and multivariate Cox analysis for excluding the interference of the possible multicollinearity in this study. That is to say, LASSO COX regression tool improved the predictive accuracy of the 21-DNA methylation-related classifier.
Several limitations existed in our study. Firstly, the number of the stage III-IV OC patients of our external validation set was limited and prospective study with a more samples was needed to validate the value of the 21-DNA methylation signature. In addition, more clinical factors should be mapped in the external validation set to improve the reliability of the DNA methylation-related predictive model. Besides, the nomogram was developed through retrospective data obtained from TCGA database, which may have hazard of selection bias.
In conclusion, in spite of the limitations mentioned above, there are still valuable implications in this comprehensive high throughput data analysis, which identified 21-DNA methylation signature for predicting stage III to IV OC patients' OS via combining differential methylation analysis, Cox regression analysis, ROC analysis, and Kaplan-Meier analysis. Besides, we constructed a nomogram that integrated the 21-DNA methylation-related signature and several clinical covariates to strengthen the predicted accuracy in a quantitative method for prognosis of stage III to IV OC. The result indicated that our nomogram yielded a strong robustness for predicting OS of stage III to IV OC.