Identification of Vitamin D-related gene signature to predict colorectal cancer prognosis

Colorectal cancer (CRC) is one of the most common malignant carcinomas worldwide with poor prognosis, imposing an increasingly heavy burden on patients. Previous experiments and epidemiological studies have shown that vitamin D and vitamin D-related genes play a vital role in CRC. Therefore, we aimed to construct a vitamin D-related gene signature to predict prognosis in CRC. The CRC data from The Cancer Genome Atlas (TCGA) was performed as the training set. A total of 173 vitamin D-related genes in the TCGA CRC dataset were screened, and 17 genes associated with CRC prognosis were identified from them. Then, a vitamin D-related gene signature consisting of those 17 genes was established by univariate and multivariate Cox analyses. Moreover, four external datasets (GSE17536, GSE103479, GSE39582, and GSE17537) were used as testing set to validate the stability of this signature. The high-risk group presented a significantly poorer overall survival than low-risk group in both of training set and testing sets. Besides, the areas under the curve (AUCs) for signature on OS in training set at 1, 3, and 5 years were 0.710, 0.708, 0.710 respectively. The AUCs of the ROC curve in GSE17536 for 1, 3, and 5 years were 0.649, 0.654, and 0.694. These results indicated the vitamin D-related gene signature model could effectively predict the survival status of CRC patients. This vitamin D-related gene signature was also correlated with TNM stage in CRC clinical parameters, and the higher risk score from this model was companied with higher clinical stage. Furthermore, the high accuracy of this prognostic signature was validated and confirmed by nomogram model. In conclusion, we have proposed a novel vitamin D-related gene model to predict the prognosis of CRC, which will help provide new therapeutic targets and act as potential prognostic biomarkers for CRC.


INTRODUCTION
Colorectal cancer (CRC) remains the third most common malignant tumor worldwide and is the second one in cancer-related deaths (Ferlay et al., 2018). In China, CRC is the third one in annual incidence and also the leading cause of tumor-related deaths (Zhang et al., 2019). The overall survival of CRC is unsatisfactory, and the five-year overall survival rate is just over 50% (Frampton & Houlston, 2017). CRC is highly malignant and causes a tremendous economic burden on patients (Luengo-Fernandez et al., 2013). Therefore, the poor prognosis and the increasing incidence of CRC have provided strong motivation to construct a predictive model in CRC patients, which will benefit personalized treatment in clinical management.
Vitamin D is a fat-soluble vitamin and there are many genes related to its metabolism and action (Fedirko et al., 2019). Vitamin D can be obtained from diet or the endogenous synthesis in the epidermis under sunlight exposure (Saraff & Shaw, 2016). It has been demonstrated that vitamin D benefits clinical outcome and improves the long-term survival of CRC patients (Xu et al., 2021). Besides, circulating vitamin D may be a CRC biomarker and vitamin D deficiency is closely related to the high incidence of CRC (He et al., 2018;Meeker et al., 2016;Wu et al., 2020). Vitamin D has been reported to inversely correlated with CRC and has a protective effect on CRC in clinical studies (Jenab et al., 2010;Manousaki & Richards, 2017;Meeker et al., 2016). Mechanically, vitamin D inhibits tumor growth and affects CRC development in animal models (Hummel et al., 2013;Newmark et al., 2009). Besides, it has been reported that vitamin D inhibits proliferation and promotes differentiation in CRC cells (Leyssens et al., 2015). Many genes are related to vitamin D metabolism and action, play an essential role in tumors; for example, CYP24A1, an important vitamin D-related gene, was up-regulated in CRC patients and nominated as a promising biomarker (Sadeghi & Heiat, 2020). Vitamin D and its related genes are correlated with the homeostasis of the intestinal epithelium, regulate immune cells, and may protect against colon cancer (Cantorna, Snyder & Arora, 2019). In a word, vitamin D and its related genes were closely correlated with CRC tumor pathogenesis.
Prognostic model based on gene set, with a higher predictive value than a single gene, was constructed in previous studies to predict the clinical outcome of patients. However, it has not been explored whether vitamin D-related gene signature could be biomarkers for the prognosis of CRC. A systematic functional study of vitamin D-related genes in CRC will contribute to our more profound understanding of vitamin D-related genes and provide new ideas for the pathogenesis of CRC. We aimed to clarify whether vitamin D-related genes can help CRC patients predict the prognosis and provide a better diagnosis and treatment. Therefore, we focused on constructing a prognostic signature based on vitamin D-related genes.

Data collection
The studies involving human participants were reviewed and approved by the Medical ethics committee of Zhongnan Hospital of Wuhan University. Gene expression matrix data of CRC samples were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov) and the Gene Expression Omnibus (GSE17536, GSE103479, GSE39582, and GSE17537) data pool (GEO, https://www.ncbi.nlm.nih.gov/geo/) as training and external validation sets, respectively (up to December 16, 2020) (Sun et al., 2019. Transcriptome information for CRC was included. These were the inclusion criteria in this study: (1) The samples with both mRNA sequencing data and survival status from the CRC patients; (2) The samples with complete clinical information. The expression data of TCGA of 480 CRC and 41 non-tumorous tissues was the HT Seq-FPKM type and has been normalized. In TCGA database, there were 453 CRC patients with overall survival status, the expression matrix was arranged with "dplyr" package (Günalan, Cebioğlu & Çonak, 2021) and "data.table" (Zhou et al., 2020) package in R. Then, we searched the clinical information of the patients from the database, including survival information, age, gender, tumor stage, and clinical classification; 393 CRC patients with complete clinical information were selected for further analysis. And GEO databases (GSE17536, GSE103479, GSE39582, and GSE17537), containing 177, 155, 562, and 55 CRC patients, were normalized using "limma" package in R. A total of 194 vitamin D-related gene lists were obtained from two literature pieces (Fedirko et al., 2019;Pytel et al., 2019). Among them, 173 vitamin D-related genes were identified from TCGA.

Construction and validation of the prognostic vitamin D-related gene signature for CRC
The 453 CRC samples from TCGA were downloaded as training set. To establish the prognostic model, univariate and multivariate Cox analysis was applied to examine the predictive value of vitamin D-related genes in CRC patients with overall survival by using the "survival" package in R (Tang et al., 2017). A P-values of < 0.05 were considered significant and shown by a forest plot. After acquiring the prognostic vitamin D-related genes of CRC, genes with a hazard ratio (HR) > 1 were identified as high-risk genes, while HR < 1 was identified as low risk. The model was utilized to evaluate the correlation between overall survival (OS) and vitamin D-related genes. Finally, risk scores were calculated based on the corresponding Cox regression coefficient and the expression levels of 17 genes of 453 CRC patients in TCGA, and the formula was shown as risk score = expression of vitamin D-related gene 1 × coefficient 1 + expression of vitamin D-related gene 2 × coefficient 2 + vitamin D-related gene n × coefficient n. Patients were divided into high and low-risk groups based on the median risk score. Kaplan-Meier (KM) analysis was used to compare survival between high-risk and low-risk groups with "survival" package and "survminer" package in R. To verify the stability of the signature, the receiver operating characteristic (ROC) analysis was performed, and the AUC was calculated by the R package "survivalROC" (Wang et al., 2018). Then, we downloaded four datasets (GSE17536, GSE103479, GSE39582, and GSE17537) in the Gene Expression Omnibus database (GEO database) to validate the model.

Independent prognostic analysis and clinical correlation analysis
The clinical information was selected including age, gender, clinical stage, tumor-nodemetastasis (TNM) status. To evaluate whether the risk score is an independent prognostic factor associating with overall survival, univariate and multivariate independent prognostic analyses were performed with the "survival" package. A p-value of < 0.05 was considered significant statistically. Then, time-dependent ROC curve was analyzed, and the area under the curve (AUC) was calculated using the "survivalROC" package. A clinical relevance analysis was conducted using the "beeswarm" package (Bi et al., 2019) to assess the correlation between prognostic vitamin D-related genes and clinical information. All results were considered to be statistically significant at p < 0.05.

Nomogram model construction
A nomogram based on the results of previous multivariate Cox regression analyses was constructed by using the R language "rms" package and "hmisc" package Wells et al., 2018). Calibration curve were generated to analyze the agreement between nomogram and ideal observation. By quantifying the net benefits under different threshold probabilities, a decision curve analysis is performed to evaluate the clinical utility of the predictive nomogram.

Statistical analysis
Statistical analysis was made using R software (Chan, 2018), with version number v4.0.3. Kaplan-Meier analysis was applied to estimate the overall survival rate of different groups, and log-rank was used to test the significance of the difference between other groups. The plot and the heatmap were generated using the R package "ggplot2" (Günalan, Cebioğlu & Çonak, 2021) and "pheatmap" (Zhang et al., 2020).

Construction and validation of vitamin D-related prognostic model for CRC
The complete clinical information of 393 CRC patients was collected including survival status, survival time, age, gender, clinical stage, and TNM stage. A prognostic model was developed based on above identified 17 genes by performing multivariate Cox analyses ( Table 1). The CRC patients were divided into a high-risk group (n = 226) and a low-risk group (n = 227) according to the risk score's median cutoff (Fig. 3A). The ranked risk scores of patients and their survival status in the training set and the testing sets were plotted, respectively. Riskscore = (0.237883456 Ã CYP24A1) The results represented that patients in the high-risk group showed a higher mortality rate than those in a low-risk group in both of training set and testing sets.

Predictive performance of the vitamin D-related gene signature
Kaplan-Meier survival analysis was performed between high-and low-risk groups in both of training set and testing sets to investigate the prognostic value of the vitamin D-related gene risk signature. The high-risk group had a significantly poorer OS than that of the low-risk group in the training set (Fig. 3A) and the testing sets (Figs. 4A, 4C, 4E, 4G). The ROC analysis was performed to evaluate the predictive efficiency of the vitamin D-related gene signature. The AUCs for the signature on OS at 1, 3, and 5 years were 0.710, 0.708, 0.710 in the training set (Fig. 3B). The AUCs for the signature on OS at 1, 3, and 5 years were 0.649, 0.654, and 0.694 in the GSE17536 (Fig. 4B), 0.676, 0.600, 0.588 in GSE103479 (Fig. 4D), 0.595, 0.675, 0.672 in GSE39582 (Fig. 4F), and 0.896, 0.923, 0.861 in GSE17537 (Fig. 4H). These results showed that the model could predict the survival status  suggested that the differences in the distribution of the clinical stage and TNM classification in the two groups were statistically compelling (Table 2). Besides, our model was proved to be an independent factor for estimating the prognosis of CRC (Figs. 5A, 5B). The univariate analysis displayed that age (P = 0.002), pathological stage (P < 0.001), TNM classification (P < 0.001), and the vitamin D-related prognostic model (P < 0.001) were significantly correlated with overall survival. The multivariate analysis indicated that vitamin D-related prognostic models (P < 0.001), age (P < 0.001), and T classification (P = 0.046) remained independent prognostic factors related to OS (Table 3). Overall, vitamin D-related prognostic model could effectively predict the survival of CRC patients. Clinical correlation analysis A correlation analysis was carried out between our prognostic model and patients' clinical parameters in all 393 CRC cases. With the gradual increase of clinical stage, T and N classification, the risk score of the prognostic model was getting higher (Figs. 6A-6C), suggesting that the signature was reliable. However, it was not shown that the M classification was associated with clinical traits (Fig. 6D).

Establishment of the nomogram model
To predict the survival probability of CRC patients in 1, 3, and 5 years, a nomogram model was established. Calculating the score of clinical factors and vitamin D-related gene signature, a straight line was generated to evaluate the probability of 1-,3-and 5-year survival at each time point in TCGA dataset. According to this nomogram model, patients in the low-risk group showed a better survival probability. Furthermore, the calibration curve for predicting patient survival presented that the rate of predicted 1-,3-and 5-year survival closely paralleled the actually observed ratio of the training set, respectively

DISCUSSION
It has been reported that vitamin D is closely related to CRC (Zhou et al., 2020). Epidemiology data suggested that vitamin D deficiency is linked to high incidence and/or mortality by CRC (Ferrer-Mayorga et al., 2019). Vaughan's research showed that colorectal cancer (CRC) surgical resection decreased circulating vitamin D, whose level is a prognostic biomarker associated with poor survival (Vaughan-Shaw et al., 2020). Recent research indicated that vitamin D inhibits cell proliferation and maintains the stem cell phenotype by increasing several stemness-related gene expressions (Fernández-Barral et al., 2020). Besides, one of the main vitamin D-related genes, CYP24A1 was found to be negatively associated with the prognosis of prostate cancer (Khan et al., 2019). Therefore, vitamin D-related genes are critical in the development and progression of carcinogenesis. Studies have shown that some vitamin D-related genes are of great importance in CRC (Pálmer et al., 2001). However, only a few vitamin D-related genes have been studied in depth to confirm their tumor progression role. The functions of most vitamin D-related genes are not well understood, so clarifying the possible role of vitamin D-related genes in prognosis will support the further research in this field. Moreover, the 5-year survival rate of CRC patients is not satisfactory (McQuade et al., 2017), so better prognostic models are urgently needed. At present, it is not clear whether vitamin Five RNA sequencing databases were enrolled in the present study, including TCGA, GSE17536, GSE103479, GSE39582, and GSE17537; and clinical characteristics were collected from independent international databases. We identified 17 vitamin D-related genes in CRC from TCGA database. The accuracy of the model was further validated by evaluating the survival analysis, ROC curve, and independent prognostic analysis. The signature consisted of 17 vitamin D-related genes significantly correlated with the overall survival in CRC patients. The result of Kaplan-Meier analysis supported that patients in the high-risk group had poorer overall survival (OS) than those in the low-risk group. Subsequently, we used a time-ROC analysis to test its performance in the training and validation groups at different time points. The results were comparable compared with other studies.
In this study, we developed a novel vitamin D-related gene signature to predict the prognosis of CRC. Tumors with low risk-score displayed substantially better prognosis in both training and testing sets. Kaplan-Meier survival analysis and ROC analysis were utilized to evaluate the predictive efficiency of vitamin D-related gene signature in the training set and testing sets. Moreover, univariate and multivariate Cox regression analysis revealed that this signature is a firm tool that acts as an independent prognostic factor for OS in CRC patients. Results showed relationships between the signature and some clinicopathological characteristics including the clinical stage and T/M/N stages, but the association was not found with gender. The reason may be that there are no differences in vitamin D-related genes in different genders. Incorporated with independent clinical risk factors, the vitamin D-related gene signature presented good performance. Subsequently, based on the vitamin D-related gene prognostic signature, a nomogram was built to predict 1-, 3-, and 5-year OS. Nomogram integrating clinical factors with Vitamin D-related gene signature predicts the prognosis of CRC patients in clinical practice, providing enlightenment in identifying more gene targets for CRC treatment. The calibration curve also showed agreement between model prediction and reality in the training set.
The signature was composed of 17 vitamin D-related genes with prognostic capability. Among them, fourteen vitamin D-related genes (CYP24A1, TGFB1, IGFBP2, IGFBP3, VGF, AGAP2, DENND6B, LRRC8A, BCL6, FCER2, ELL, CD36, AGPAT1, and TMCO6) were associated with high risk and three (NCOA7, GSR, and MPC1) were identified as protective factors. Researches have demonstrated that dysregulation of vitamin D-related pathways contributes to the pathogenesis of CRC (Dou et al., 2016;Sluyter, Manson & Scragg, 2021;Vladimirov et al., 2020). Studies have demonstrated that some vitamin D-related genes of this signature are involved in metabolism, including AGPAT1, VGF, and MPC1 (Agarwal et al., 2017;Stephens et al., 2017;You, Lee & Roh, 2021). As we know, metabolism reprogramming is one of the important characteristics in tumorigenesis (Pavlova & Thompson, 2016); considering their biological function in the regulation of metabolism, it is reasonable to speculate that these genes may be involved in metabolic reprogramming in CRC. In head and neck cancer, MPC1 regulates ferroptosis in vivo and in vitro (You, Lee & Roh, 2021); meanwhile, recent studies have confirmed the significant role of ferroptosis in CRC (Xu et al., 2021); so, it may be promising to further study whether MPC1 can modulate ferroptosis in CRC. Besides, AGPAT1 deficiency mice have reduced plasma glucose levels and body weight and alteration in lipid synthesis (Agarwal et al., 2017). VGF gene expression was lower in obese patients, indicating that VGF may be significantly correlated with obesity (Koc et al., 2021); It has been reported that VGF is associated with energy balance and glucose homeostasis (Ferri et al., 2011;Stephens et al., 2017).
According to our findings, CYP24A1, IGFBP2, and IGFBP3 were related to high risk in CRC. Previous studies also indicated that these genes were identified as important oncogenes in human cancers, including CRC (Ben-Shmuel et al., 2013), pancreatic cancer (Kendrick et al., 2014), breast cancer (Dean et al., 2014), hepatocellular carcinoma (HCC), gastroesophageal cancer, and so on (Jin et al., 2020). It was confirmed that the abnormal expression of CYP24A1 was related to cancer risk and might contribute to tumor aggressiveness (Hu et al., 2019;King et al., 2010). Growing evidence indicates that IGFBP2 plays an essential role in several key oncogenic processes, such as tumor cellular proliferation, epithelial to mesenchymal transition, stemness, invasion, angiogenesis, immunoregulation, and migration (Li et al., 2020). In our signature, CD36, ELL, and MPC1 are important genes associated with CRC prognosis. This finding is consistent with the following experimental results. Rainer Hubmann et al's study revealed that NOTCH3 and CD36 influence the uptake, tissue distribution, and activation of vitamin D (Kiourtzidis et al., 2020); inhibition of CD36 partially reversed the migration promotion effect of CAFs on CRC cells. By reducing CD36 level in vivo, the migration ability of CRC cells is significantly repressed (Fedirko et al., 2019). ELL has been identified as a potential tumor suppressor by interacting with c-Myc and suppresses colon tumor xenograft growth (Chen et al., 2016). It was found that MPC1 was downregulated in CRC and its low expression was correlated with poor prognosis; Mechanically, decreased MPC1 enhanced tumor metastasis by activating the Wnt/β-catenin pathway (Wang et al., 2021). In addition, the function of few genes might not be consistent with our findings. For example, LRRC8A was identified to regress the proliferation of the CRC cells (Fujii et al., 2018;Xu, Wang & Shi, 2020). While LRRC8A is a high-risk factor in our model, so further studies are expected to explore its function in CRC. The role of AGAP2, BCL6, GSR, and FCER2 in CRC is not well explored, but they have been studied in other diseases. AGAP2 is closely related to TGFβ1. In activated hepatic stellate cells (HSC), silencing AGAP2 expression diminished proliferation and migration in response to TGFβ1 and led to pro-fibrotic effects (Navarro-Corcuera et al., 2020). Glutathione reductase (GSR) is a key regulator in disease; GSR-null mice were susceptible to HCC induced by chemicals and the liver had higher DNA damage markers (McLoughlin et al., 2019), which may suggest its protective role in cancer. GSR-KO mice also implicated the GSH system as the main regulator of lung development (Robbins et al., 2021). The main research of BCL6 is focused on lymphoma and it was considered as a therapeutic target (Leeman-Neill & Bhagat, 2018). FCER2 (CD23) surface expression is mutually exclusive in chronic lymphocytic leukemia (CLL) (Hubmann et al., 2020). Moreover, the role of DENND6B, NCOA7and TMCO6 in cancer is largely unknown, our results indicated their important role in colorectal cancer; further studies are expected to reveal their biological function.
To date, there is no published studies using vitamin D-related gene signatures to predict prognosis of CRC. Our study is the first one to establish a vitamin D-related signature for prediction of CRC prognosis. The external verification can increase the credibility of this model, so our research was verified by four external databases (GSE17536, GSE103479, GSE39582, and GSE17537) to further prove its accuracy, which making our results more convincing. Moreover, a large amount of CRC patients enrolled, our research was conducted with a reasonable sample size and achieved ideal performance, making the future applications more extensive. Our research confirmed that vitamin D and vitamin D-related genes play a vital role in CRC and supports that vitamin D-related genes may be potential therapeutic targets. Compared to traditional clinical risk scoring, incorporating our vitamin D-related gene signature with clinical risk factors would benefit prognostic prediction. It may provide a theoretical basis for predicting whether vitamin D supplementation is effective in CRC patients. For the first time, our research showed that vitamin D-related genes are associated with CRC prognosis. It strongly supports the vital function of vitamin D in tumors, suggesting that further studies of the mechanisms of vitamin D and vitamin D-related genes in tumors are needed.
However, we have to mention the following defects in the present study. First of all, the validation of vitamin D-related genes in CRC clinical samples will make our conclusion more reliable. Meanwhile, basic experiments are critical to revealing the mechanism of vitamin D-related genes in CRC.

CONCLUSION
In summary, we first identified and constructed a vitamin D-related gene signature to predict the prognostic outcome of CRC patients, which was an independent prognostic marker for overall survival. Our current study highlighted that vitamin D-related genes played an essential role in the prognosis of CRC and deepened the understanding of vitamin D-related genes. Efforts to reveal the role of vitamin D in CRC will help develop more rational treatment strategies in the future. These findings would help provide new therapeutic targets and prognostic markers for CRC.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by the National Natural Science Foundation of China (Qiu Zhao, No. 81870390), and the Zhongnan Hospital of Wuhan University Science, Technology and Innovation Seed Fund (Lan Liu, No. CXPY2017033). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: National Natural Science Foundation of China: 81870390. Zhongnan Hospital of Wuhan University Science, Technology and Innovation Seed Fund: CXPY2017033.