A signature of hypoxia-related factors reveals functional dysregulation and robustly predicts clinical outcomes in stage I/II colorectal cancer patients

Background The hypoxic tumor microenvironment accelerates the invasion and migration of colorectal cancer (CRC) cells. The aim of this study was to develop and validate a hypoxia gene signature for predicting the outcome in stage I/II CRC patients that have limited therapeutic options. Methods The hypoxic gene signature (HGS) was constructed using transcriptomic data of 309 CRC patients with complete clinical information from the CIT microarray dataset. A total of 1877 CRC patients with complete prognostic information in six independent datasets were divided into a training cohort and two validation cohorts. Univariate and multivariate analyses were conducted to evaluate the prognostic value of HGS. Results The HGS consisted of 14 genes, and demarcated the CRC patients into the high- and low-risk groups. In all three cohorts, patients in the high-risk group had significantly worse disease free survival (DFS) compared with those in the low risk group (training cohort—HR = 4.35, 95% CI 2.30–8.23, P < 0.001; TCGA cohort—HR = 2.14, 95% CI 1.09–4.21, P = 0.024; meta-validation cohort—HR = 1.91, 95% CI 1.08–3.39, P = 0.024). Compared to Oncotype DX, HGS showed superior predictive outcome in the training cohort (C-index, 0.80 vs 0.65) and the validation cohort (C-index, 0.70 vs 0.61). Pathway analysis of the high- and low-HGS groups showed significant differences in the expression of genes involved in mTROC1, G2-M, mitosis, oxidative phosphorylation, MYC and PI3K–AKT–mTOR pathways (P < 0.005). Conclusion Hypoxic gene signature is a satisfactory prognostic model for early stage CRC patients, and the exact biological mechanism needs to be validated further.


Background
Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide, and ranks third in terms of morbidity and mortality [1]. About half of the CRC patients are at stages I/II, and more than a quarter of the early-stage patients (I-III) relapse after initial treatment [2]. Hypoxia is a common feature of solid tumors, and contributes to tumor progression and therapeutic recalcitrance. Intra-tumoral hypoxia is considered to be an indicator of poor prognosis [3,4], and even regulates genes involved in the invasion and migration of CRC cells, which is consistent with recent reports indicating an association between lack of oxygen and distant metastasis [5][6][7]. Hypoxia reduces the efficacy of not only surgical resection [8], but also radiotherapy and chemotherapy [9,10]. Only limited options are available at present for hypoxia-related targeted therapies, and there is no unequivocal evidence from clinical trials as yet regarding their efficacy, likely due to the lack of individual-based treatment [8,11,12]. Therefore, an accurate and noninvasive method is needed to assess tumor hypoxia. To this end, we identified a hypoxia-related gene signature (HGS) from CRC-specific transcriptomes through highthroughput expression analysis. The HGS demarcated the stage I/II CRC patients into distinct prognostic groups, and functional and pathway analyses provided new insights in the mechanism of CRC recurrence.

Patients
The gene expression profiles of CRC tissue samples obtained from six public cohorts, including 309 CRC patients from the CIT/GSE39582 gene microarray dataset that served as the discovery cohort, were retrospectively analyzed. The two largest individual data sets-CIT/GSE39582 and The Cancer Genome Atlas (TCGA)-were used for training and independent validation. The meta-validation cohort consisted of the remaining four microarray data sets-GSE14333, GSE17536, GSE37892 and GSE33113-which were obtained from the Gene Expression Omnibus (GEO) database. All datasets are from the GPL570 platform ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array). TCGA cohort data was downloaded from Broad GDAC Firehose, and the other data sets were obtained directly in their processed format from the GEO database through Bioconductor package 'GEOquery' . Transcripts per million (TPM) of level 3 RNA-Seq data in log2 scale was applied to calibrate the gene expression levels in TCGA cohort. The 'combat' algorithm of the R package 'sva' and the z-scores were used to correct the batch effects, in order to standardize microarray data across multiple experiments and compare them independent of the original hybridization intensities. The data of 1877 CRC patients enrolled from Sep 27 to Dec 26, 2018 was also included.

Construction and validation of HGS
To construct a prognostic HGS, annotated functional database MSigDB (version 6.2) [13][14][15] was used to identify a list of hypoxia-related genes with the keyword "hypoxia", and the HGSs measured by all platforms were selected. The log-rank test was used with 1000 randomizations (80% of samples each time) to evaluate the association between each HGS and clinical outcome in the training dataset. Genes that were repeatedly significant were selected as the candidates of the hypoxia signature. To minimize the risk of over-fitting, Cox proportional hazards regression model was applied with the least absolute shrinkage and selection operator (LASSO) (glmnet, version 2.0-16). The penalty parameter was estimated by tenfold cross-validation in the training data set at 1 SE beyond the minimum partial likelihood deviance.
A time-dependent receiver operating characteristic (ROC) curve (survival ROC, version 1.0.3) at 5 years was plotted using Kaplan-Meier estimation, and used to determine the optimal HGS cutoff to separate patients in the training data set into the low-risk and high-risk groups. The HGS corresponding to the shortest distance between the ROC curve and the point representing 100% true positive rate and 0% false-positive rate was used as the cutoff value. Univariate analysis was used to evaluate the prognostic value of the HGS in stage I/II CRC patients, and in patients at all stages in the training and independent validation cohorts. In the multivariate analyses, HGS was combined with other clinical and pathological variables.

Functional annotation and analysis
To investigate the biological characteristics of the HGS, enrichment analysis was conducted for differentially expressed genes (DEGs) between the risk groups in TCGA CRC data set using R package 'gProfileR' . Gene Set Enrichment Analysis (GSEA) was further performed using Bioconductor package 'HTSanalyzeR' to predict the significant pathways [16].

Statistical analysis
All statistical analysis was performed in R software (version 3.5.1; http://www.Rproj ect.org). Descriptive statistics were computed for all variables, and expressed as mean ± standard deviation (SD) or medians and interquartile ranges (IQR) for continuous factors, and as frequencies for categorical factors. Continuous values were compared using Student-t tests between different groups. Log-rank test was used to evaluate results of the univariate analysis of HGS and other clinico-pathological factors with disease free survival (DFS). Multivariate analysis was performed with the Cox proportional hazards regression model. The C-index was calculated by 'survcomp' (version 1.32.0). P values less than 0.5 were considered statistically significant.

The establishment of HGS
We analyzed the CIT gene microarray dataset (GSE39582) and created the discovery subset with 309 eligible CRC patients ( Fig. 1). After exclusion of genes with MAD > 0.5 and less median expression, 1636 genes were retained for further analysis. Following selection of 80% of the repeatable genes via 1000 random Cox univariate regressions, we identified 106 genes that were associated with DFS, of which 14 hypoxia-related genes were selected to construct the HGS using LASSO Cox regression for stage I/II CRC patients (Fig. 2). The risk scores were calculate using the formula derived from the Based on time-dependent ROC curve analysis, the optimal cutoff of HGS for stratifying patients in the training set into the high and low risk groups was determined to be a satisfactory RFS cutoff at 5 years (Fig. 3b, e and h). The incidence of tumor recurrence was higher among the patients in the high-risk group compared to the lowrisk group when the entire CIT dataset (n = 566) was used as a training cohort (Fig. 4a, P < 0.001).

Discussion
The current therapeutic modality for early stage CRC is surgical resection. Nevertheless, the recurrence rate of stage I/II CRC patients after surgery is still higher than 20% [17]. Despite identifying numerous genes that affect the recurrence and metastasis of CRC [18,19], no prognostic gene signature has been validated so far. Effective prognostic biomarkers are therefore urgently needed to predict the DFS rate and risk of relapse after treatment in early-stage CRC patients. In this study, we developed Fig. 4 a Distribution of the HGS risk score and its correlation to recurrence in the training, TCGA cohort and meta-validation cohort, with risk scores as the continuous variable for individual patients. The DFS and recurrence in the different hypoxia risk groups of training cohort (b), TCGA cohort (e) and meta-validation cohort (h). Kaplan-Meier curves comparing survival of patients with low or high hypoxia risk in training cohort (c), TCGA cohort (f) and meta-validation cohort (i). P-values were calculated using log-rank tests and HR is short for hazard ratio a novel predictive hypoxia-related 14-gene signature for CRC, and validated it in multiple cohorts. The results suggest that the HGS can successfully predict the DFS of CRC patients after treatment. Oxygen provides energy for cell growth and division, and is a key signaling molecule. The hypoxia inducible factors (HIFs) respond to changes in oxygen levels and cellular energy status, and trigger a transcriptional program [20] that mediates malignant transformation and progression. Not surprisingly, lack of oxygen and overexpression of HIF is associated with poor prognosis in cancer patients [21,22]. Furthermore, tumor cells induce pro-angiogenic factors to vascularize the tumor in order to survive and proliferate under hypoxic condition, which are regulated along with the hypoxiarelated genes [23]. In fact, HIF inhibitors also improve the efficacy of anti-angiogenesis drugs during cancer treatment [21,22,24]. Consistent with these previous studies, we found that hypoxia-related genes worsened CRC prognosis by affecting genes involved in the cell cycle, indicating that hypoxia-related drug targets can potentially improve CRC prognosis.
Several studies have shown an association between tumor hypoxia and poor therapeutic outcome in cancer patients. Oxygen deficiency reduces the efficacy of surgical resection and increases metastatic potential of tumors [25,26]. The current endogenous markers of hypoxia cannot accurately monitor intra-tumor oxygen levels, which limits the efficacy of hypoxia-targeting drugs [8,27]. The HGS stratified the stage I/II CRC patients into high-and low-risk groups that differed significantly in terms of DFS during a 5-year follow-up. The C index results of the 14-gene hypoxia signature showed its clinical superiority to Oncotype DX. This novel prognostic tool can thus identify CRC patients with highly hypoxic tumors that at risk of treatment failure, and enable clinicians to make informed decisions regarding treatment regimens. It may also help    [27][28][29]. Therefore, there is an urgent need for better therapeutic targets to improve the prognosis of CRC patients. We observed a significant enrichment of cell cycle/metabolism-related genes and functions, such as mTROC1, E2F, G2-M, mitosis, oxidative phosphorylation, MYC and PI3K-AKT-mTOR (P < 0.005), in the high-risk, low DFS group. Previous studies have found a correlation between these targets and CRC development, although they did not link these targets to tumor hypoxia [30][31][32][33][34][35]. Further studies are needed to clarify the effects of hypoxia on cell cycle in order to identify more targets and improve the prognosis of early stage CRC patients.
In conclusion, we identified a prognostic hypoxiaassociated gene signature using genome-wide analysis to predict DFS in patients with stage I/II CRC. These hypoxia-associated DEGs are potential therapeutic targets against CRC. However, our study is beset with the limitations associated with all retrospective studies, in addition to systematic errors resulting from analyzing samples from disparate databases. Therefore, further clinical and pharmacological tests are needed to validate our results.

Conclusions
We developed a novel HGS to stratify stage I and II CRC patients into high-and low-risk groups with greater accuracy compared to the currently used clinicopathological risk factors. A "risk prediction model" was also constructed using the HGS, the scores of which can be readily applied to independent prospective cohorts. HGS is a highly promising prognostic tool for personalized treatment regimens and clinical management of stage I/ II CRC patients.