Immune gene prognostic signature for disease free survival of gastric cancer: Translational research of an artificial intelligence survival predictive system

Graphical abstract


a b s t r a c t
The progress of artificial intelligence algorithms and massive data provide new ideas and choices for individual mortality risk prediction for cancer patients. The current research focused on depict immune gene related regulatory network and develop an artificial intelligence survival predictive system for disease free survival of gastric cancer.
Multi-task logistic regression algorithm, Cox survival regression algorithm, and Random survival forest algorithm were used to develop the artificial intelligence survival predictive system.
Nineteen transcription factors and seventy immune genes were identified to construct a transcription factor regulatory network of immune genes. Multivariate Cox regression identified fourteen immune genes as prognostic markers. These immune genes were used to construct a prognostic signature for gastric cancer. Concordance indexes were 0.800, 0.809, and 0.856 for 1-, 3-and 5-year survival. An interesting artificial intelligence survival predictive system was developed based on three artificial intelligence algorithms for gastric cancer. Gastric cancer patients with high risk score have poor survival than patients with low risk score.
The current study constructed a transcription factor regulatory network and developed two artificial intelligence survival prediction tools for disease free survival of gastric cancer patients. These artificial intelligence survival prediction tools are helpful for individualized treatment decision.

Background
Epidemiological data demonstrated gastric cancer (GC) is one of the leading digestive malignant tumors and ranks second for tumor-related deaths with 782,685 deaths in 2018 [1]. Although advances in early screening, diagnosis, and treatments reduced mortality to some extent [2,3], the prognosis of gastric cancer patients were still unsatisfactory [4]. From a clinical point of view, early identification of high risk GC patients with high mortality and more precise individualized treatments are helpful to improve the prognosis of high risk GC patients. Therefore, reliable and precise individual mortality risk prediction is of great significance for optimizing individual treatment effect.
Great progress has been made in precision medicine in recent years [5,6]. Precision medical predictive tools can be used in predicting individual mortality risk in different time-points and the efficacy for different treatments. [7][8][9]. However, precision medical predictive tools for predicting mortality risk of gastric cancer patients have not been able to meet the needs of individualized treatment.
Bioinformatics advances provided tremendous impetus to precision medical research in tumorigenesis and progression. Bioinformatics is helpful to explore the intrinsic biological regulatory mechanisms and potential pathways for tumorigenesis and progression [10][11][12][13]. In recent years, more and more studies have focused on the important role of immune microenvironment in tumorigenesis and progression [14,15]. Jiang et al. developed a prognostic signature in predicting the prognosis of gastric cancer patients [16]. Yang et al. developed a prognostic signature based on immune genes to predict overall survival of GC patients [17]. However, this prognostic signature did not provide calculation formula and was limited for clinical application. Therefore, it is valuable to develop individualized precision medical predictive tools for early identification of gastric cancer with high mortality risk.
Precision medical predictive tools can provide individualized mortality risk prediction and help clinicians early identify patients with high mortality risk. Recently, our team has successfully developed several precision medical predictive tools based on genetic data for different tumors [18][19][20]. In recent years, the development of artificial intelligence algorithms provides more choices for the predictive studies of tumor prognosis. Multi-task logistic regression algorithm, Cox survival regression algorithm, and random survival forest algorithm have been used to improve the accuracy of predictive models and prognostic models [21][22][23][24][25][26][27][28][29][30][31][32][33][34]. Therefore the current research was devoted to explore potential immune regulatory mechanism for prognosis of GC and construct artificial intelligence survival prediction tools for predicting individual mortality risk in different time-points.

Study datasets
Model dataset was downloaded from TCGA database, involving 22,412 mRNAs from 375 GC specimens and 32 normal specimens (TCGA, PanCancer Atlas, Cell 2018, http://www.cbioportal.org/ study/summary?id=stad_tcga_pan_can_atlas_2018). Two hundred and sixty five GC patients were included after moving patients with follow-up information less than one month. Validation dataset (GSE62254) was obtained from GEO database. GSE62254 dataset contained two hundred and seventy nine patients and 19,765 mRNAs (GPL570 platform). Probe IDs were translated to official gene symbols according to Gencode.v29 background file.

Differentially expressed analyses
Differentially expressed analyses between GC samples and normal samples were performed by R package ''edgeR" [35]. Normalization of original data was performed by Trimmed mean of M values (TMM) method. P value < 0.05 and log 2 |fold change| >1 were set as cut off values for differentially expressed analyses.

Immune gene and transcription factor
Immune genes were obtained from Immunology Database and Analysis Portal (ImmPort) database [36]. Transcription factors act an important role in molecular biology regulation mechanisms of tumorigenesis and progression. To explore potential regulatory relationships between transcription factors and immune genes, three hundred and eighteen transcription factors were identified from Cistrome Cancer database [37]. Associations between tumor infiltrating immune cells and immune genes were explored via Tumor IMmune Estimation Resource (TIMER) database (https://cistrome.shinyapps.io/timer/) [37]. The tumor infiltrating immune cell dataset was downloaded from Tumor IMmune Estimation Resource database involved 11,509 TCGA samples and original values of six tumor infiltrating immune cells (B_cell, CD4_Tcell, CD8_Tcell, Neutrophil, Macrophage, and Dendritic).

Study datasets
Flow chart in current study was presented in Supplementary  Fig. 1. Model cohort contained 265 GC patients and validation cohort contained 279 GC patients. The comparisons of clinical parameters between model cohort and validation cohort were presented in Table 1.

Differentially expressed analyses
Differentially expressed analyses (

Functional enrichment analyses
Potential biological functions of immune genes were explored through Gene Ontology (GO) functional enrichment analyses. Bar plot (Fig. 1C), bubble plot ( Supplementary Fig. 2) and chord plot ( Supplementary Fig. 3) indicated potential biological functions of immune genes as following: collagen-containing extracellular matrix, extracellular matrix, endocrine process, extracellular structure organization, regulation of systemic arterial blood pressure by hormone, regulation of systemic arterial blood pressure, positive regulation of response to external stimulus, extracellular matrix organization, regulation of systemic arterial blood pressure mediated by a chemical signal, platelet alpha granule lumen, platelet alpha granule, regulation of inflammatory response, regulation of digestive system process, secretory granule lumen, and cytoplasmic vesicle lumen.

Prognostic immune genes and regulatory network
There were 160 immune genes identified as prognostic markers for GC via univariate Cox regression. Transcription factor is a key link in the molecular regulatory pathway. To better understand the regulatory relationship between transcription factors and immune genes, the current study performed correlation analyses to identify transcription factors closely related to immune genes. According to cut off values of |correlation coefficient| > 0.5 and P value < 0.01, 19 transcription factors and 70 immune genes were identified to construct a transcription factor regulatory network of immune genes (Fig. 2) via Cytoscape v3.6.1 [39].
Survival curve analyses of immune genes (Fig. 3) demonstrated that DFS were significantly different between different immune expression status (P < 0.05). The predictive value distribution chart

Predictive performance in model cohort
According to median of prognostic signature score, Fig. 6A demonstrated that there was significant difference between two groups. Concordance indexes were 0.800, 0.809, and 0.856 for 1year, 3-year, and 5-year DFS (Fig. 6B). Calibration curves were showed in Supplementary Fig. 6.

Predictive performance in validation cohort
Survival curves (Fig. 7A) demonstrated that DFS in high risk group was significantly poor than that in low risk group. Concordance indexes were 0.911, 0.815, and 0.815 for 1-year, 3-year, and 5-year DFS (Fig. 7B). Calibration curves were showed in Supplementary Fig. 7. Decision curve charts were presented in Supplementary Fig. 8.

Artificial intelligence survival predictive system
An artificial intelligence survival predictive system was developed to provide on-line prediction for DFS (Fig. 8). This artificial intelligence survival predictive system was provided at: https:// zhangzhiqiao7.shinyapps.io/Smart_Cancer_Survival_Predictive_ System_15_GC_D1006/. Three individual mortality risk predictive curves predicted by, Multi-task logistic regression (MTLR) algorithm (Fig. 8A), Random survival forest (RFS) algorithm (Fig. 8B), and Cox survival regression algorithm (Fig. 8C). This artificial intelligence survival predictive system could provide 95% confidence interval of predicted mortality and median survival time for an individual patient.

Gene survival analysis screen system
Univariate Cox regression recognized 160 immune genes as prognostic markers for GC. A precision medical predictive tool named Gene Survival Analysis Screen System was developed to explore the prognostic influence of these 160 immune genes in different subgroups (Fig. 9). Gene Survival Analysis Screen System was provided at: https://zhangzhiqiao7.shinyapps.io/Gene_Sur-vival_Subgroup_Analysis_15_GC_D1006/.

Independence assessment
In model cohort, this prognostic signature was an independent risk factor for DFS (Table 3). In validation cohort, prognostic signature, American Joint Committee on Cancer PM, and gender were independent risk factors for DFS.

Subgroup analyses
Subgroup analyses demonstrated that DFS in high risk group was significantly poor than that in low risk group for different stage groups in both model cohort and validation cohort (Fig. 10).

Clinical correlation analyses
Clinical correlation analyses displayed the correlation coefficient between clinical parameters and immune genes (Fig. 11). Supplementary Fig. 9 depicted correlation significance between clinical parameters and immune genes.

Tumor infiltrating immune cell correlation analyses
The original values of six tumor infiltrating immune cells (B_cell, CD4_Tcell, CD8_Tcell, Neutrophil, Macrophage, and Dendritic) were downloaded from Tumor IMmune Estimation Resource database. Fig. 12 showed the correlation coefficient between tumor infiltrating immune cells and immune genes. Supplementary Fig. 10 depicted correlation significance between tumor infiltrating immune cells and immune genes.

Tumor infiltrating immune cells
The median values were used to classify high-risk patients and low-risk patients. Expression of tumor infiltrating immune cells in patients with high risk score and low risk score was presented in Fig. 13. Scatter plots between tumor infiltrating immune cells and immune genes were shown in Fig. 14. Correlation analyses between tumor infiltrating immune cells and prognostic score were shown in Fig. 15.

Subgroup analyses among different races
Subgroup analyses demonstrated that there was no significant difference of immune gene prognostic signature among different races ( Supplementary Fig. 11).

Discussion
The current study identified 14 immune genes closely related to the prognosis of gastric cancer. These immune genes may become valuable prognostic biomarkers and potential targets for tumor immunotherapy. The current study constructed a transcription factor regulatory network of immune genes, which may be helpful to understand the potential molecular regulatory mechanisms of tumorigenesis and progression. The current study developed and validated a prognostic signature for DFS of GC patients. In addition, we developed two novel artificial intelligence survival predictive tools to predict individual mortality risk. Additionally, the artificial intelligence survival predictive system could provide 95% confidence interval of predicted mortality and median survival time. These two artificial intelligence survival predictive tools were convenient in providing individualized mortality risk prediction with advantages of simple operation and intuitive results. Previous studies have reported several prognostic models for predicting the prognosis of gastric cancer patients [16,17]. However, these prognostic models can't predict the mortality risk for an individual patient. In recent years, artificial intelligence algorithms, including Multi-task logistic regression algorithm, Cox survival regression algorithm, and random survival forest algorithm, have made great progress in survival prediction [21][22][23][24][25][26]. With the supports of these advanced artificial intelligence algorithms, we have successfully established artificial intelligence survival predictive system to predict the mortality risk curve for an individual patient. Meanwhile, the current artificial intelligence survival predictive system could provide 95% confidence interval of predicted mortality and median survival time. Individual level survival prediction and median survival time prediction are the unique prediction ability of our artificial intelligence survival predictive system. In the current study, we creatively applied three artificial intelligence algorithms for predicting the individual mortality risk of cancer patients, providing a feasible idea and valuable reference for the future survival prediction studies.
The current research searched TISIDB databases to explore the biological process of these immune genes (http://cis.hku.hk/ TISIDB/index.php). The major biological process of DFFA Like Effec-tor A (CIDEA) is DNA catabolic process, endonucleolytic, temperature homeostasis, and negative regulation of cytokine production. The major biological process of V-set and immunoglobulin domain containing 1 (VSIG1) is tissue homeostasis, epithelial cell development, and epithelial cell morphogenesis. The major biological process of fermitin family member 1 (FERMT1) is ameboidal-type cell migration, establishment or maintenance of cell polarity, and epithelial cell migration. The major biological process of resistin (RETN) is positive regulation of collagen metabolic process, aging and regulation of collagen metabolic process. The major biological process of NLR family, CARD domain containing 5 (NLRC5) is negative regulation of immune system process, response to virus, and positive regulation of cytokine-mediated signaling pathway. The major biological process of gap junction protein, beta 6, 30 kDa (GJB6) is cellular glucose homeostasis, response to molecule of bacterial origin, and aging. The major biological process of glypican 3 (GPC3) is retinoid metabolic process, morphogenesis of a polarized epithelium, and ossification. The major biological process of interferon-induced protein 44-like (IFI44L) is response to virus, defense response to virus, and defense response to other organism. The major biological process of low density lipoprotein receptorrelated protein 8 (LRP8) is regulation of cell morphogenesis involved in differentiation, retinoid metabolic process, and isoprenoid metabolic process. The major biological process of fibrinogen beta chain (FGB) is extrinsic apoptotic signaling pathway via death domain receptors, vascular process in circulatory system, and adaptive immune response. The major biological process of NADPH oxidase 1 (NOX1) is oxidoreduction coenzyme metabolic process, angiogenesis, and response to oxidative stress. The major biological process of corneodesmosin (CDSN) is keratinocyte differ- The current study identified several valuable prognosis-related biomarkers, which might be potential candidates in targeted treatment. Huang Y et al. reported that methylation level of Cell Death Inducing CIDEA was related with tumor microsatellite instability [40]. Cell proliferation was mediated by NADPH Oxidase 1 (Nox1) expression in colon carcinoma cell lines [41]. High expression of Nox1 in colon cancer accelerated the tumor growth and inhibition of Nox1 might become a new therapeutic strategy for colorectal cancer treatment [42]. Low expression of Interferon Induced Protein 44 Like (IFI44L) impaired antiviral state induced by IFN and might be potential candidate for reduction of virus replication [43]. Glypican 3 (GPC3) was potential immune target for hepatocellular carcinoma through fusing to alpha epitope of HBsAg [44]. GPC3-S-Fab could kill GPC3 positive hepatocellular carcinoma cells through natural killer cells [45]. NLR Family CARD Domain Containing 5 (NLRC5) had a weak moderate effect for modulating CD8 + T-cell responses in mice small intestine with rotavirus infection [46]. NLRC5 could mediate proliferation, migration and invasion of renal cell carcinoma through wnt/beta-catenin signaling pathway [47]. Previous studies indicated potential effects of immune genes in molecular biological regulatory mechanisms and pathways of tumorigenesis and progression. The current study constructed a transcription factor regulatory network of immune genes. This regulatory network was helpful to reveal the potential role of immune genes in tumorigenesis and progression.
Tumor infiltrating macrophages could express interleukin 25, which was significantly related to the prognosis of gastric cancer after radical resection [48]. Macrophages could enhance the invasiveness of gastric cancer cells by enhancing the transforming growth factor beta / bone morphogenetic protein pathway [49]. High expression of CD8 + T cells was associated with prognosis and lymph node metastasis of gastric cancer [50]. High regulatory T cells to CD8 + T cells ratio was significantly correlated with poor prognosis of gastric cancer [51]. High infiltration of CD8 + T cell increased programmed death ligand 1 and decreased survival rate [52]. Tumor antigen could stimulate CD8 + T cells [53,54]. Neutrophils could inhibit the anti-tumor ability of dendritic cells [55]. Pro-tumoral neutrophils could up-regulate immunosuppressive dendritic cells [56]. Dendritic cell infiltration plays an important role in the initiation of primary anti-tumor immune response [57]. Neutrophils could inhibit immune response and accelerate the progress of gastric tumors via GM-CSF-PD-L1 pathway [58].
Advantages of current study: The current research developed artificial intelligence predictive tools for GC patients based on three artificial intelligence algorithms. Artificial intelligence sur- vival predictive system was convenient to predict individualized mortality risk with visual illustration and numerical presentation. The artificial intelligence predictive tools can provide more accurate individual prognostic information and are more suitable to meet the needs of individualized treatment and precision medicine. In order to provide more reliable prognostic information for individual patient, three individual mortality risk predictive curves were presented based on different artificial intelligence algorithms.
The current artificial intelligence survival predictive system could provide 95% confidence interval of predicted mortality and median survival time.
Shortcomings of current study: First, the current research explored clinical significance of immune genes in tumorigenesis and progression based on datasets from public databases. However the conclusions have not yet been verified by researchers' own research data. Second, sample size of the current research is rela-   tively small, weakening the credibility of research conclusions to a certain extent. Third, some patients with gastric cancer have comorbidities and the other cancers. The current study did not consider the impacts of comorbidities and the other cancers on the individual mortality curve. Fourth, due to the lack of efficacy indicators of radiotherapy and chemotherapy, our predictive system can't predict the efficacy of different treatment regimens for cancer patients. Fifth, overall survival is a valuable outcome for prognostic evaluation of gastric cancer. However, due to the lack of effective clinical dataset, the current study did not explore and establish the prognostic model for gastric cancer patients by using overall survival as final outcome. Prospective basic researches are helpful to further explore the potential role of immune genes in molecular biological regulatory mechanism of tumorigenesis and progression. Third, due to the complexity of artificial intelligence algorithms, the calculation process could not be displayed by simple formula, blocking the application of artificial intelligence algorithms in the field of tumor prognosis to a certain extent.

Conclusion
The current study constructed a transcription factor regulatory network and developed two artificial intelligence survival prediction tools (https://zhangzhiqiao7.shinyapps.io/Smart_Cancer_Sur-vival_Predictive_System_15_GC_D1006/ and https:// zhangzhiqiao7.shinyapps.io/Gene_Survival_Subgroup_Analysis_ 15_GC_D1006/) for disease free survival of gastric cancer patients. These artificial intelligence survival prediction tools are helpful to predict individual mortality risk and provide valuable prognostic information for individualized treatment decision.

Ethics approval and consent to participate
The studies in TCGA database and GEO database have received ethical approval from ethics committees of their respective research institutes. These studies obtained informed consent from patients before admission. The current study is a second study based on public datasets from TCGA database and GEO database. Details of all patients in public datasets have been anonymously processed and therefore the current research does not involve patients' privacy information. The current study was performed according to public database policy and declaration of Helsinki. TCGA database and GEO database allows researchers to use public datasets for scientific purposes. Ethical approval of this study was waived in accordance with the recommendations of Management Measures for Ethical Review of Clinical Research, Ethics Committees of Shunde Hospital, Southern Medical University because the current study was a retrospective study based on public datasets. Therefore ethical approval and informed consent were not required for the current study.

Consent for publication
All authors approved the publication.

Availability of data and materials
The study data is available at: https://zhangzhiqiao7.shinyapps. io/Gene_Survival_Subgroup_Analysis_15_GC_D1006/.

Funding
The current research was funded by Medical Science and Technology Foundation of Guangdong Province (B2018237). Foshan Science and Technology Bureau (2020001004584).