Artificial intelligence predictive system of individual survival rate for lung adenocarcinoma

Graphical abstract


Background
Lung adenocarcinoma (LUAD) is one of the most common malignant tumours, accounting for 1.8 million cancer related deaths [1]. The prognosis of LUAD patients is still unsatisfactory until today [2]. At present, there were many predictive models in predicting survival rate for LUAD patients at the group level [3,4]. However, the prognosis of LUAD patients with different clinical characteristics is complicated till now. Therefore, the prognostic prediction of one special group is far from meeting the need of individualized treatment decisions for a special individual patient.
In recent years, artificial intelligence has made great progress in cancer research, diagnosis, prognostic prediction, and treatment. Various algorithms have been used to find the lncRNAs closely related to different diseases, so as to provide valuable biomarkers for clinical diagnosis [5][6][7][8]. Artificial intelligence predictive models based on gene expression data could predict prognosis for different tumors [9,10]. Artificial intelligence algorithms based on gene expression data could also be used to predict the efficacy of tumor treatments [11,12]. Tumor imaging recognition based on deep learning technology is helpful for early diagnosis and accurate classification for tumor [13]. The above researches suggested that artificial intelligence has broad application prospects in cancer research, diagnosis, prognostic prediction, and treatment.
From 2012 to 2013, Professor Gary S Collins developed several on-line prognostic predictive tools to predict mortality for different tumours [14][15][16][17][18]. In recent years, several studies have predicted individual survival curves for cancer patients based on different algorithms [19][20][21]. Our research team developed several precise medicine predictive systems to predict individual survival curves for different cancers before clinical treatment based on genetic data [22][23][24][25][26][27][28][29]. Several experts proposed valuable suggestions for improving our precision medicine tools presented in our previous articles. Can precision medicine predictive tools provide individualized mortality curves for patients receiving radiotherapy or chemotherapy? Can the current precision medicine predictive tools convey treatment benefits by comparing the individual mortality curves of patients under different treatments, which might be more valuable for optimizing individual treatment decisions?

Study cohorts
All datasets were obtained from the Surveillance Epidemiology and End Results (SEER) database (2010-2015). All included patients were diagnosed with lung adenocarcinoma (ICD-O-3 code: 8140). To eliminate the confounding effects of other causes, living subjects with survival time <12 months were removed from present study (n = 852).

Research methods
Induction and deduction are common research methods in scientific research [30]. The data type in the current study belongs to cohort study data. The current research used induction method to summarize cohort research information of LUAD patients, so as to obtain general rules of prognosis of LUAD patients. Then, the current research used the deductive method to study the prognosis of individual patients from the general rules of the overall cohort.

Artificial intelligence algorithms
The random survival forest algorithm was performed in accordance with the original studies [31][32][33][34]. The multitask logistic regression (MTLR) algorithm performed in line with the suggestions of previous articles [35,36]. The Cox proportional hazards algorithm was performed based on advices in original articles [37,38].

Statistical analyses
R software 3.5.2 was used to run statistical analysis and relevant algorithms [22][23][24][25][26][27][28][29]. The research methods and statistical analysis steps were as follows: Continuous data with non normal distribution were presented as median (first and third quantiles). The continuity data were compared using nonparametric test. The counting variables were compared using chi square test. The random survival forest method was used to evaluate variable importance. Multivariate Cox regression was carried out for determining risk factors of LUAD. The prognostic score was constructed according to the coefficient of above risk factors. Kaplan-Meier curve was carried out for presenting prognosis of different cohorts. The area under the time-dependent receiver operating characteristic curve and Brier score were used to assess accuracy of different prognostic models. The flow chart of methodology was presented in Fig. 1.

Study datasets
The included patients (n = 50,687) were randomly split to model group and validation group. Baseline features for patients in model group and validation group were presented in Table 1. After random grouping, there existed significant differences in mortality, PN, and laterality between model group and validation group, whereas there was no significant difference for other variables between model cohort and validation cohort.

Variable importance assessment and selection
Random survival forest method was carried out to assess the variable importance and the association between error rate and number of trees. As shown in Supplementary document 1, the variable importance from high to low was as follows: stage, PM, chemotherapy, PN, age, PT, gender, and radiation_surgery.
Through multivariable Cox regression, stage, PM, chemotherapy, PN, age, PT, sex, and radiation_surgery were identified as independent risk factors of prognosis in model group (Table 2). In validation group, stage, PM, chemotherapy, PN, age, PT, sex, and radiation_surgery were determined as risk factors of LUAD.
The generated survival rate and 95% confidence interval were shown in Fig. 2A. Moreover, Fig. 2B provided comparisons of four

Performance of prognostic models
The survival curve chart (Fig. 4) indicated that three artificial intelligence prognostic models could discriminate high mortality risk patients from low mortality risk patients in the model cohort.
The survival curve chart (Supplementary document 2) indicated that three artificial intelligence prognostic models could discriminate high mortality risk patients from low mortality risk patients in the validation cohort. For 12-month survival rate (Fig. 6A), the concordance indexes of RFS, MTLR, and Cox model were 0.824, 0.834, and 0.834, respectively. For 36-month survival rate (Fig. 6B), the concordance indexes of RFS, MTLR, and Cox models were 0.851, 0.857, and 0.853, respectively. For 60-month survival rate (Fig. 6C), the concordance indexes of RFS, MTLR, and Cox models were 0.870, 0.876, and 0.871, respectively.
Brier scores of RFS, MTLR, and Cox models were 0.124, 0.152, and 0.143, respectively, indicating that accuracy of RFS Model was better than that of MTLR model and Cox model.
Supplementary document 3, 4, and 5 showed calibration plots of RFS, MTLR, and Cox models in the model cohort. Supplementary document 6, 7, and 8 showed calibration plots of RFS, MTLR, and Cox models in the validation cohort.

Discussion
The current study established an interesting artificial intelligence survival predictive system for LUAD patients. Three different artificial intelligence algorithms could provide individual survival curves that supported and corroborated one another. More importantly, this artificial intelligence survival predictive system could successfully predict and compare individual mortality risk curves under four treatments, providing clinical benefit comparisons at the individual level to optimize individualized treatment decisions.
The current study provides a convenient individual mortality risk predictive tool for lung adenocarcinoma patients. For example, the 16-month(user selected time-point) survival rate were 0.51 for non-treatment status (black line in Fig. 2), 0.72 for chemotherapy status (blue line in Fig. 2), 0.55 for radiation_surgery status (green line in Fig. 2), and 0.75 for combination therapy status (red line in Fig. 2) for a special patients with the following parameters: age 67 years, stage 3, PT 2, PN 1,PM 0, and gender female. Through the predicted survival rate at a specific time-point in the upper part of Fig. 2 and the individual survival curve in the lower part of Fig. 2, patients can easily get their own individual survival curve, so as to optimize individualized treatment decisions.
Several predictive models were constructed for predicting overall survival for lung cancer at the group level [3,4,22]. However, these prognostic models could only forecast mortality risk at group level with unique clinical features. Our predictive system could provide individualized survival curves at individual level, which is importance for individualized treatment decisions. Additionally, our survival predictive system predicted and compared the individual survival curves under different treatments, which is valuable for patients to make optimal medical decisions before treatment.
Considering the opaque nature of the operation process of artificial intelligence algorithms, the current research provided three individual survival curves predicted using different artificial intelligence algorithms for clinical application. The concordances of prognostic models based on MTLR, RFS, and Cox algorithms were 0.703, 0.650, and 0.698, respectively, for glioblastoma multiforme patients (the higher the concordance, the higher the accuracy of the prognostic model), whereas the Brier scores were 0.039, 0.059, and 0.040, respectively, (the smaller the Brier score, the higher the accuracy of the prognostic model) [39], indicating that MTLR algorithm was superior to RFS and Cox algorithms for prognostic prediction. The MTLR model had an AUROC of 0.92 and a brier score of 0.08, suggesting good clinical application value for prognostic prediction [36]. The RSF algorithm performed better than Cox algorithm for predicting the prognosis of major adverse cardiac and cerebrovascular event patients [40]. Harvard University artificial intelligence research team developed an artificial intelligence predictive tool for predicting prognosis of glioblastoma and provided individual predictive information of predicted survival time, one-year survival rate, and overall survival curves [41]. The concordances of prognostic models based on RFS and Cox algorithms were 0.680 and 0.690, respectively, for glioblastoma patients in another prognostic study [41]. Combined with  the concordance indexes of different algorithms in the current research and the conclusions of previous studies, we first recommended the survival curve predicted by MTLR algorithm, and the survival curves predicted by RFS algorithm and Cox algorithm might be used as the second and third recommendation survival predicted curves.
Random survival forest algorithm has the following abilities: dealing with multicollinearity effects, selecting the most important parameters in accordance with defined tree threshold, and assessing the variable relative importance [42,43]. The RFS algorithm has been recommended for prognostic models and was reported to be superior to the Cox model in terms of predictive accuracy [44][45][46]. It was reported that multitask learning algorithm was superior to  Cox algorithm in cancer survival analysis [47]. The concordance indexes and calibration plots of the RFS and MTLR models indicated good predictive performance, which was similar to that of the Cox model. RFS model scored higher than Cox and MTLR models on Brier score. Our results indicated that the RFS and MTLR models have good clinical application values, which were not inferior to the Cox model in survival analysis.
Our artificial intelligence predictive tool could predict the survival curve of lung cancer patients under different treatments and its 95% confidence interval. An individualized survival predictive function is very important for identifying patients at high mortality risk. Our artificial intelligence predictive tool is helpful for providing valuable predictive information for individual patient survival rate in optimizing the comprehensive management and individualized treatment for LUAD patients. In clinical work, it is necessary for lung cancer patients who were predicted as high mortality risk by artificial intelligence predictive tool to consider receiving more active and timely antitumor treatment to improve the prognosis.
It was true that artificial intelligence methods have made great progress in the fields of tumor diagnosis, treatment, prognostic prediction, and research. However, in the clinical field, artificial intelligence method can never become a substitute for professional medical personnel, but exists as assistants and tools of medical personnel. Medical treatment is a comprehensive prevention and treatment system covering physical, psychological, and social relations, rather than a simple superposition of high-end equipment, cold technology, and complex algorithms. For clinicians, artificial intelligence technology helps to optimize the diagnosis and treatment of tumors.
Limitations: First, because lung cancer has high clinical heterogeneity, the treatment of lung cancer was too complex to form a unified treatment. Meanwhile, the great progress of radiotherapy, chemotherapy, and surgery were not conducive to forming the clinical subgroups. Although the SEER database provided limited treatment information (including radiotherapy, chemotherapy, and surgery), the treatment information was not sufficient to divide patients into stable subgroups. Second, all study patients were included from 2010 to 2015, resulting in a relatively short follow-up time (minimum follow-up time: 36 months; maximum follow-up time: 83 months). A long follow-up time is valuable for ascertaining the clinical application value of survival predictive system for long time points of greater than 83 months. Third, as nonparametric algorithms, the RFS and MTLR algorithms couldn't be directly expressed by conventional mathematical formulas, weakening the interpretability and clinical application of these predictive models to a certain extent. Fourth, external research datasets provide more convincing evidences for the conclusions of prognostic studies. However, we failed to identify a long-term tumour research dataset similar to the dataset provided by the SEER database. Independent external follow-up datasets are of great value for the construction and verification of tumour prognostic research.

Conclusions
The current study designed an individualized survival predictive system, which could provide individual survival curves using three different artificial intelligence algorithms. This artificial intelligence predictive system could directly convey treatment benefits by comparing individual mortality risk curves under different treatments. This artificial intelligence predictive tool is available at https://zhangzhiqiao11.shinyapps.io/Artificial_Intelligence_Sur-vival_Prediction_System_AI_E1001/.

Ethics approval
The current study was approved by ethics committee of Shunde Hospital, Southern Medical University and exempted from informed consent.

Consent for publication
All authors have reviewed the manuscript and consented for publication.

Availability of data and materials
The study data are available at the SEER database (https:// seer.cancer.gov/).

Funding
This study was supported by Foshan Science and Technology Bureau (2020001004584). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.