A Risk Score Model Based on Nine Differentially Methylated mRNAs for Predicting Prognosis of Patients with Clear Cell Renal Cell Carcinoma

Purpose DNA methylation alterations play important roles in initiation and progression of clear cell renal cell carcinoma (ccRCC). In this study, we attempted to identify differentially methylated mRNA signatures with prognostic value for ccRCC. Methods The mRNA methylation and expression profiling data of 306 ccRCC tumors were downloaded from The Cancer Genome Atlas (TCGA) to screen differentially methylated lncRNAs and mRNAs (DMLs and DMMs) between bad and good prognosis patients. Uni- and multivariable Cox regression analyses and LASSO Cox-PH regression analysis were used to select prognostic lncRNAs and mRNAs. Corresponding risk scores were calculated and compared for predictive performance in the training set using Kaplan-Meier OS and ROC curve analyses. The optimal risk score was then identified and validated in the validation set. Function enrichment analysis was conducted. Results This study screened 461 DMMs and 63 DMLs between good prognosis and bad prognosis patients, and furthermore, nine mRNAs and six lncRNAs were identified as potential prognostic molecules. Compared to nine-mRNA status risk score model, six-lncRNA methylation risk score model, and six-lncRNA status risk score model, the nine-mRNA methylation risk score model showed superiority for prognosis stratification of ccRCC patients in the training set. The prognostic ability of the nine-mRNA methylation risk score model was validated in the validation set. The nine prognostic mRNAs were functionally associated with neuroactive ligand receptor interaction and inflammation-related pathways. Conclusion The nine-mRNA methylation signature (DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1) may be a useful prognostic biomarker and tool for ccRCC patients. The present results would be helpful to elucidate the possible pathogenesis of ccRCC.


Introduction
Clear cell renal cell carcinoma (ccRCC) is the most common histological subtype of renal cell carcinoma which accounts for more than 90% of cancers in the kidney [1]. ccRCC is characterized by high rates of progression and mortality, and patients have a dire prognosis [2]. Traditional clinical predictors of outcome such as clinical stages, grade, and necrosis could not achieve satisfactory results for ccRCC patients with similar clinical features but different outcomes [3]. Therefore, novel sensitive prognostic biomarkers for ccRCC are demanded.
Aberrant DNA methylation is a critical factor for initiation and progression of cancer [4]. Associations of DNA methylation with prognosis of ccRCC patients are gaining attention. For instance, based on genome-wide CpG methylation profiling, Wei et al. develop a five-CpG-based classifier as a reliable tool for survival prediction of ccRCC patients [5]. Using The Cancer Genome Atlas (TCGA) data, a recent study by Chen et al. finds that a four-gene DNA methylation signature is tightly related to prognosis of patients with kidney renal clear cell carcinoma and is able to serve as a prognostic predictor [6]. Moreover, a promoter methylation classifier panel of 172 differentially methylated CpGs is reported to be capable of classifying prognosis of nonmetastatic ccRCC patients [7]. Long noncoding RNAs (lncRNAs) modulate gene expression at various levels and facilitate cancer phenotypes via interaction with other cellular macromolecules, such as DNA, RNA, and protein [8,9]. Important implications of lncRNAs have been demonstrated for tumorigenesis of ccRCC [10]. A large number of lncRNAs gain DNA methylation in ccRCC, and several lncRNAs with promoter methylation are related to outcome of patients [11]. However, aberrantly methylated lncRNAs which can function as prognostic biomarkers for ccRCC patients remain elusive.
In this paper, we tried to identify potential mRNA methylation signature and lncRNA methylation signature for survival prediction of ccRCC patients through a comprehensive analysis of mRNA and lncRNA methylation data downloaded from TCGA dataset, and the predictive performances of methylation or status risk scores were compared based on these prognostic lncRNAs or mRNAs to select the best prognostic model for ccRCC.

Datasets and Preprocessing.
Methylation and gene expression data of 309 ccRCC tumor tissue samples and 24 normal control tissue samples were downloaded from TCGA repository using Illumina Infinium Human Methylation 450 BeadChip platform and Illumina HiSeq 2000 RNA Sequencing platform. Patients without complete survival information were excluded from the study, and the resulting 306 patients with ccRCC tumor tissue samples were used as the training set in the current study. E-MTAB-3274 [12] dataset consisting of methylome data of 107 ccRCC samples was obtained from EBI ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) based on Illumina Infinium Human Methylation 450 BeadChip platform. Of these samples, 102 samples had corresponding survival information and were used as the validation set.
According to probe sites and ID information, we identified lncRNAs and mRNAs from TCGA set and E-MTAB-3274 dataset based on HUGO Gene Nomenclature Committee (HGNC) [13] database (http://www.genenames .org/) that enrolls 4112 lncRNAs and 19201 protein coding genes. As a result, 13919 mRNAs and 1028 lncRNAs were identified.

Differential Methylation Analysis between Good and Bad
Prognosis Samples. In the training set, differentially methylated lncRNAs and mRNAs (DMLs and DMMs) were identified between good prognosis patients (being alive with OS time lasting for more than 60 months) and bad prognosis patients (being dead with OS time lasting for less than 36 months). Significant DMLs and DMMs were defined as those with FDR < 0:05 and |log 2 FC | >0:263. Next, two-way hierarchical clustering analysis of these significant DMLs and DMMs was carried out based on centered Pearson correlation algorithm.

Correlation Analysis between DNA Methylation and
Expression of Significant DMLs and DMMs. We analyzed relationships of overall methylation and expression levels of preselected significant DMLs and DMMs through calculating Pearson correlation coefficient (PCC) and Spearman correlation coefficient (SCC) [14]. Subsequently, PCC of methylation and expression levels of each individual DML or DMM was computed. These DMLs and DMMs with significant negative correlations (p value <0.05) were selected to be applied in further analysis.
2.4. Prognostic Risk Scoring Model Building. In order to construct risk scoring models for predicting survival in ccRCC patients, we employed a step-by-step strategy to identify the prognostic DMLs and DMMs in the training set. Firstly, we performed univariable Cox regression analysis to assess the association between methylation levels of the aforementioned DMLs and DMMs with patients' OS time. The resulting lncRNAs and mRNAs were considered significant if their p values <0.05. Secondly, the significant lncRNAs and mRNAs were chosen to undergo a multivariable Cox regression analysis. The significant RNAs (p value <0.05) were determined to be independent predictors of prognosis and were then used to fit a L1-penalized (LASSO) Cox-PH regression [15] model with estimation of optimal lambda value through performing 1,000 cross-validations. The optimal feature combination was selected using LASSO regularized regression algorithm in penalized package. CV and lambda values were obtained by feature filtering. The small lambda value usually resulted in small penalty term value as well as over fitting phenomenon; therefore, we chose the largest lambda value which is corresponding to the lambda parameter value. Eventually, the resulting optimal set of predictive lncRNAs and mRNAs was applied to build risk scoring models based on their methylation levels or status together with the estimated multivariable Cox regression coefficients. Methylation status of an individual lncRNA or mRNA was defined according to optimal cutoff point of methylation value (Monte-Carlo p value <0.05) which was determined based on patients' survival through performing X-Tile Bio-Informatics Tool [16] (high expression, expression status = 1; low expression, expression status = 0). We utilized the following equations to quantify methylation or status risk scores for survival prediction: Status risk score = ∑β RNAn × status RNAn, Methylation risk score = ∑ðβ RNAn × methylation RNAnÞ.
β RNAn suggests regression coefficient of RNAn; status RNAn and methylation RNAn suggest methylation status and values of RNAn, respectively.
Risk score was estimated for each patient using above formula. With median risk score as cutoff point, the training set or the validation set was dichotomized into a high-risk group and a low-risk group. 2.6. Functional Enrichment Analysis. According to the optimal risk score, the training set was divided into a high-risk group and a low-risk group. Between the two risk groups, differentially expressed genes (DEGs) were identified based on the threshold of FDR < 0:05 and |log 2 FC | >0:5 and then underwent KEGG pathway enrichment analysis. Significant signaling pathways were selected in the present study. Based on methylation value, ccRCC patients were categorized into two groups.

Disease Markers
Results of ROC curve analysis were depicted in Figures 2 and 3 and Table 3. AUC values of nine-mRNA methylation risk score for the training set and the validation set were 0.987 and 0.884, respectively, which are higher compared to the other risk scores. All above observations suggest that the nine-mRNA methylation risk score performs better than other risk scores for survival prediction of ccRCC patients.

Characterization of Four Independent Prognostic
Features and Construction of a Composite Nomogram. The univariable Cox regression analysis in the training set indicated that age, histologic grade, pathologic N, pathologic T, pathologic stage, hemoglobin, platelet qualitative, and nine-mRNA methylation risk score were significantly related to patients' OS time (p value <0.05, Table 4). were found to be independent predictive factors of prognosis in multivariable Cox regression analysis (Table 4, Figure 4). It revealed that prognostic value of the nine-mRNA methylation risk score model was independent of clinical factors. To integrate the nine-mRNA methylation risk score with the three prognostic clinical variables, a composite nomogram was built (Figure 5(a)). Calibration curves for 5-year survival probability showed sound agreement between predicted and actual probabilities of 5-year survival ( Figure 5(b)).

Functional Implications of the Nine Prognostic mRNAs.
To investigate functional characteristics of the nine prognostic mRNAs in ccRCC biology, we performed functional enrichment analysis on the identified DEGs between the high-risk group and the low-risk group of the training set classified by the nine-mRNA methylation risk score. We found 715 DEGs between the two risk groups, consisting of 190 downregulated genes and 525 upregulated genes (Figures 6(a) and 6(b)). These genes were significantly involved in cytokine-cytokine receptor interaction, neuroactive ligand receptor interaction, toll-like receptor signaling pathway, and cell receptor signaling pathway (Table 5).

Discussion
Considering the prognostic potential of DNA methylation for ccRCC, the present work conducted a genome-wide analysis of mRNAs and lncRNAs methylation profiling of ccRCC samples from TCGA systematically, with a view to discover novel and reliable prognostic biomarkers. There were 461 DMMs and 63 DMLs between good prognosis and bad prognosis patients. Among which, 24 DMMs and 7 DMLs were   identified to be independent predictive factors by uni-and multivariable Cox regression analyses. Eventually, a nine-mRNA methylation signature and a six-lncRNA methylation signature were determined by LASSO Cox-PH regression model. Four prognostic prediction systems were established, and four risk score models were generated, including nine-mRNA methylation risk score model, nine-mRNA status risk score model, six-lncRNA methylation risk score model, and six-lncRNA status risk score model. The nine-mRNA methylation risk score model performed better than other DMLbased risk scores in predicting survival outcome of ccRCC patients. This study suggests that the nine-mRNA methylation risk score is reliable and powerful for classifying ccRCC patients according to prognosis.
Our study identified a prognostic nine-mRNA methylation signature, comprising of DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1. DMRTA2, a member of the DMRT family, is involved in central nervous system development [21]. Methylation of DMRTA2 is associated with survival of patients with triple-negative breast cancer [22]. DRGX encodes dorsal root ganglia homeobox protein that is a transcription factor implicated in embryonic development of the nervous system. There is evidence that DGRX may be related to serum concentration of monocyte chemoattractant protein-1, an inflammation biomarker, in subjects treated with fenofibrate, an anti-inflammatory and triglyceride-lowering drug [23]. FAM167A, also namely C8orf13, belongs to the FAM167   [24,25]. Low expression of FAM167A is related to longer survival time of ccRCC patients [26]. FGGY encodes protein FGGY carbohydrate kinase domain containing proteins that phosphorylate carbohydrates [27]. FGGY is significantly upregulated during neurogenic skeletal muscle atrophy [28]. FOX family members play roles in regulating cell growth and differentiation [29], and FOXI2 expression is associated with survival of ccRCC patients [30]. KRTAP proteins are major components of hair shaft, playing crucial roles in human hair shaft keratinization [31]. KRTAP2-1 shows an association with chemosensitivity of colorectal cells to oxaliplatin [32]. TCTEX1D family members contain a conserved domain similar to C-terminus of TCTEX1, with little information regarding their biological function [33]. TCTEX1D1 is found to be hypermethylated and downregulated in endometrial cancer [34]. Tau Tubulin Kinase 1 encoded by TTBK1 is implicated in regulating microtube assembly and stabilization. TTBK1 phosphorylation activity is linked to several neurodegenerative diseases like Alzheimer's disease [35]. UBE2QL1 protein is related to protein ubiquitination. There is evidence that UBE2QL1 is a candidate renal tumor suppressor gene [36]. Aberrant methylation of UBE2QL1 has been found in HBV-related hepatocellular carcinoma [37]. As far as we know, for the first time, the nine differentially methylated mRNAs are reported to be of prognostic value for ccRCC patients.
The training set of this study was dichotomized by the nine-mRNA methylation risk score model into a high-risk  Disease Markers group and a low-risk group. Function enrichment analysis showed that the DEGs between the two risk groups were significantly enriched in cytokine-cytokine receptor interaction, neuroactive ligand receptor interaction, toll-like receptor signaling pathway, and cell receptor signaling pathways. Mounting evidences demonstrate that neuroactive ligand receptor interaction pathway is an important pathway for ccRCC [38,39]. Toll-like receptors are critical participants in inflammation responses. Cancer-related inflammation has been accepted as an important hallmark of cancer [40]. These findings reveal that the nine prognostic mRNAs may be involved in neuroactive ligand receptor interaction and inflammation-related pathways. Further functional annotation of these mRNAs is needed to deepen our understanding of their biological implications in ccRCC prognosis. It should be noted that this study is an extensive bioinformatics study based on published data. The results of these studies should also be further validated in vitro or in vivo models. We hope that these useful clues will help other researchers to carry out relevant research.

Conclusion
We report and validate predictive value of a nine-mRNA methylation risk score model for risk stratification of ccRCC patients. The nine-mRNA methylation signature, including DMRTA2, DRGX, FAM167A, FGGY, FOXI2, KRTAP2-1, TCTEX1D1, TTBK1, and UBE2QL1, may be a useful prognostic biomarker for ccRCC patients. Further experimental investigations are required to confirm this prognostic signature.

Data Availability
All data used and/or analyzed in this study are available from TCGA database (https://portal.gdc.cancer.gov) or the EBI Array database (https://www.ebi.ac.uk/arrayexpress/).

Conflicts of Interest
The authors declare that they have no competing interests.   Figure S1: the lambda parameter curve (left) and coefficient distribution chart (right) of (a) lncRNA and (b) mRNA. The lambda parameter curve was gained basing on crossvalidation likelihood screening. The horizontal axis and vertical axis represent the different values of lambda and CVL, respectively, and the intersection of red dotted lines indicates the value of lambda parameters when CVL reaches the maximum value. The coefficient distribution charts for optimal prognosis were screened basing on Cox-PH model using L1-penalized regularized regression algorithm. (Supplementary Materials)