Idenfication of a novel signature based on mTORC1 pathway for hepatocellular carcinoma.

Background mTORC1 signal pathway play a role in the initiation and progression of hepatocellular carcinoma (HCC), but no relevant gene signature was developed. This research aimed to explore the potential correlation between mTORC1 signal pathway and HCC and establish the related genes signature. Methods HCC cases were retrieved from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and Gene Expression Omnibus (GEO) databases. The genes to be included in mTORC1-assiociated signature were selected by performing univariate, multivariate Cox regression analysis and lasso regression analysis. Then, the signature was verified by survival analysis and multiple receiver operating characteristic (ROC) curve. Furthermore, a nomogram was established and evaluated by C-index, calibration plot and ROC curve. Results The signature was established with the six genes ( ETF1, GSR, SKAP2, HSPD1, CACYBP and PNP ). Under the grouping from signature, patients in the high- risk group showed worse survival than those in the low-risk group in both three datasets. The univariate and multivariate Cox regression analysis indicated that mTORC1 related signature can be the potential independent prognostic factor in HCC. Conclusion The mTORC1 associated gene signature established and validated in our research could be used as a potential prognostic factor in HCC.


Background
Hepatocellular carcinoma (HCC) is one of the most popular cancer around the world, becoming the second leading causes of tumor related death [1]. Due to the high rate of metastasis, HCC patients with advanced stage are usually with a poor prognosis [2]. As the treatment and biomarkers of HCC have developed, the clinical outcomes of HCC patients are still unsatisfactory [3]. The occurrent and development of HCC involved interactions between genetics, epigenetics and transcriptomic alterations [4]. Many studies have verified that different genes have an effect on predicting prognosis of HCC. For example, Gu et al found that CCL14 was a potential prognostic biomarker that correlated with tumor immune cell infiltration in HCC [5]. Another study found the strong correlations between PRPF3 expression and prognosis in HCC [6]. However, as a biomarker, a single gene usually has a lower prognostic performance than multigene prognostic signature. Therefore, many gene related signature have been developed for predicting prognosis of HCC. For instance, Zhang et al established a gene signature associated with HCC microenvironment and successfully verified it [7]. Other predictive signatures based on immune [8] and glycolysis [9] also play an important role in prognosis of HCC.
Generally, the gene researches usually focus on comparing the gene expression between two groups, or pay attention to the highly upregulated and downregulated genes. Nevertheless, some genes which showed no significant difference but had important biological function and characteristics were omitted. In view of this, a computational method, Gene Set Enrichment Analysis (GSEA) determined whether a prior defined set of genes shows statistically significant differences between two biological states [10]. The advantage of GSEA is that it can identify the genes which expression is based on the trend of overall level. Consequently, in this research, we identified the pathway and gene with GSEA.
Then, we constructed the signature based on related genes and verified it, providing the more comprehensive and accurate prognostic model for clinic.

Data collection
The gene expression data with the type of level 3 RNA-seq FPKM dataset and the clinical messages in TCGA website (https://portal.gdc.cancer.gov/) were retrieved. A total of 377 HCC cases have been downloaded and analyzed, which included 50 normal samples and 327 tumor samples. As the cohorts of validation, GSE76427 datasets were retrieved from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), while ICGC-LIRI dataset was retrieved from International Cancer Genome Consortium (ICGC) database (https://icgc.org/).

Identification of pathways and related genes.
After data collection, we extracted the clinical details and generated the expression matrix of all genes in HCC of TCGA dataset. Then we divided the patients into two groups according to the survival status and employed GSEA to choose the most relevant pathway. The pathways were considered for further analyses if a normalized P-value<0.01. After that, we collected the related genes of involved pathway from Molecular Signatures Database (http://software.broadinstitute.org/gsea/msigdb/index.jsp).

Construction of signature
First, we performed the differentially expressed analysis to select the related genes. The "limma" package under R studio software was employed and a P-value<0.05 was considered statistically significant. Second, we employed the univariate and multivariate Cox regression analysis to choose the prognostic genes among the differentially expressed genes. The genes in this section were eligible for further selection if a P-value<0.05. Third, the lasso regression analysis was executed for checking selected genes. In this analysis, a lasso penalty was applied, to simultaneously accounting for shrinkage and variable selection. The optimal value of the lambda penalty parameter was defined by performing 10 cross-validations. Using the "glmnet" package, the coefficient of each included gene and risk score of each case were calculated. The calculation formula of risk score was following:. The cases were divided into two groups (high risk or low risk), according to the risk score median. To explore the time-dependent prognostic value of our gene signature, the survival analysis was performed using the "survival" package in the R studio software. The relationship between signature and other clinical messages were also evaluated and visualized with a heatmap. Besides, the multiple Receiver Operating Characteristic (ROC) curve was performed for checking the accuracy. Moreover, the univariate and multivariate Cox regression analysis were performed to verify whether the risk score is an independent prognostic factor.

Predictive nomogram design
Using the "rms" package and Cox regression model, a predictive nomogram based on age, gender, stage and risk score was constructed to predict the overall survival (OS) at 3 years and 5 years for HCC patients. Then, we used the Harrell's concordance index (C-index), calibration plot and ROC curve to evaluate the nomogram.

Results
The clinical data details of the patients used in this study are shown in Table 1. Figure 1 show the screening process and validation of our study. The result of GSEA in Figure 2 showed that 6 pathways were eligible and the details of pathways were summarized in Table 2. Considering the highest normalized enrichment score of mTORC1 pathways, we chose the 199 relevant genes in this pathway for further analysis. The differentially expressed analysis showed that 160 genes were significantly different in HCC (Appendix S1). After that, the univariate and multivariate Cox regression analysis

Discussions
In the initiation and progression of hepatocellular carcinoma, genetic factors usually play an important role. Meanwhile, mRNA gene signature based on a certain characteristic like glycolysis [11] and immune [12] have been developed for prognosis of cancers. In this research, we explored specific function to identify genes by comparing survival group and death group in GSEA. According to our results, six signal pathways was found to be highly related to survival and we established the gene signature with mTORC1 signal pathway. As we all known, mTOR pathway is a serine/threonine protein kinase belonging to the PI3K-related kinase family [13], which comprised of two distinct complexes (mTORC1 and mTORC2). With Raptor as its unique and key protein component, mTORC1 plays an important role in cell survival, autophagy, and metabolism [14]. Concerning for the mTOR signal pathway in HCC, it has been found that aberrant mTOR signaling was present in half of the HCC cases [15]. Meanwhile, an intact mTORC1 axis [16] and mTORC2-Akt1 cascade [17] were required for c-Mycdriven hepatocarcinogenesis. Moreover, some researches [15,18] provided the theoretical basis of mTOR signaling pathway-oriented targeting treatment for HCC in clinic. Overall, mTOR signal pathway plays an important role in the development and progression of HCC.
In this study, we identified six genes in signature by performing the differentially expressed analysis, univariate Cox regression analysis and lasso regression analysis. Among our included genes, five genes (ETF1, GSR, HSPD1, CACYBP and PNP) have been found to be related to HCC from previous studies. Singh et al found that ETF1, CNOT6 and XRN1 gene in HepG2 cell led to significant alteration in stability of specific mRNAs and this mechanism may hold novel cancer therapeutic targets [19]. In another research, McLoughlin concluded that GSR, TRXR1, NRF2 and oxidative stress determined hepatocellular carcinoma malignancy [20]. Lee's study [21] found that HSPD1 was down-regulated during early apoptosis of the hepatoma cell mediated by Paeoniae Radix. In terms of CACYBP, it have been verified that CACYBP can promote hepatocellular carcinoma progression in the absence of RNF41 mediated degradation [22]. Moreover, a study [23] found that PNP/fludarabine suicide gene system induced HCC cell apoptosis and inhibited the growth of HCC cells. Although we found no evidence supporting the correlation between SKAP2 and HCC, it has been verified that SKAP2 promotes podosome formation to facilitates tumor-associated macrophage infiltration and metastatic progression [24].
Being different from the previous prognostic studies in HCC, our predictive model firstly concentrated on mTORC1 signal pathway. More importantly, mTORC1 signal pathway was identified by GSEA, which indicated the underlying mechanism between survival of HCC and mTORC1 pathway. Moreover, the validation from three different datasets and a rigorous screening process enabled the identification of a reliable signature. However, our study has some limitations. First, prognostic signature showed a relatively low diagnostic performance, which may be attributed to the inadequate genes (199 related genes) in the initiation of screening process. More genes associated with mTORC1 signal pathways need to be confirmed in the future. Second, using a single characteristic (mTORC1 signal pathway) to establish the predictive model is an intrinsic weakness. Indeed, many other mechanisms, such as metabolism [25] and immune [8], have an effect on the progression of HCC. Furthermore, it's necessary to perform more independent trials and functional experiments to shed light on the mechanism linking mTORC1 signal pathway and HCC.

Conclusion
Our study is the first to identify a novel gene signature related to mTORC1 signal pathway that could be used as a potential prognostic factor in hepatocellular carcinoma. Not applicable.

Consent for publication:
Not applicable.

Competing interests:
The authors declare that they have no competing interests.

Funding:
This study was funded by the National Natural Science Foundation of China (NO. 81873248, NO.

Authors' contributions:
ZM and SZ designed the manuscript. ZM and SL wrote and completed the manuscript. HH, LY and ZC completed the data download and analysis. All the authors approved the final manuscript.  Figure 1 The flowchart of this study.   Survival analysis results of three datasets.  show the multiple ROC curve, panels (c) show the result of univariate Cox regression analysis, and panels (d) show the result of multivariate Cox regression analysis. In the heatmap, symbol "*" in means P<0.05, "**" means P<0.01, "T" represents tumor size, "M" represents distant metastasis, and "N" represents regional lymph node metastasis.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.