Prognostic value of cancer antigen -125 for lung adenocarcinoma patients with brain metastasis: A random survival forest prognostic model

Using random survival forest, this study was intended to evaluate the prognostic value of serum markers for lung adenocarcinoma patients with brain metastasis (BM), and tried to integrate them into a prognostic model. During 2010 to 2015, the patients were retrieved from two medical centers. Besides the Cox proportional hazards regression, the random survival forest (RSF) were also used to develop prognostic model from the group A (n = 142). In RSF of the group A, the factors, whose minimal depth were greater than the depth threshold or had a negative variable importance (VIMP), were firstly excluded. Subsequently, C-index and Akaike information criterion (AIC) were used to guide us finding models with higher prognostic ability and lower overfitting possibility. These RSF models, together with the Cox, modified-RPA and lung-GPA index were validated and compared, especially in the group B (CAMS, n = 53). Our data indicated that the KSE125 model (KPS, smoking, EGFR-20 (exon 18, 19 and 21) and Ca125) was the best in survival prediction, and performed well in internal and external validation. In conclusions, for lung adenocarcinoma patients with brain metastasis, a validated prognostic nomogram (KPS, smoking, EGFR-20 and Ca125) can more accurately predict 1-year and 2-year survival of the patients.

for survival data are selected by the multivariate Cox proportional hazards regression with the criterion of those significantly against OS. Although the regression is popularly used, it suffers from high variance and poor performance, especially under the conditions involving multiple factors or nonlinear effects 9 . Random survival forest (RSF) is considered as a more accurate method for right-censored survival data. Based on bootstrap data and the majority votes of the individual decision trees, RSF can construct multiple decision trees to predict the outcome 10 , and models non-linear effects and complex interactions among factors 11 .
Above all, using random survival forest, this study was intended to evaluate the prognostic value of some serum markers for lung adenocarcinoma patients with brain metastasis, and tried to integrate them into a prognostic model.

Materials and Methods
Study Population. During 2010 to 2015, the patients with a history of lung adenocarcinoma were retrospectively reviewed from the cases at the First Affiliated Hospital of Anhui Medical University (AMU) and Cancer Hospital Chinese Academy of Medical Sciences (CAMS). The inclusion criteria were: pathologically verified lung adenocarcinoma (International Association for the Study of Lung Cancer, IASLC, eighth edition 12 ), historically or newly diagnosed BM, and accepted the EGFR gene mutation detection and the laboratory examinations of CA125 (cancer antigen 125), Cy211 (cytokeratins -19 fragments), CA199 (cancer antigen 199), NSE (neuron specific enolase), CEA (carcinoembryonic antigen), SCC (squamous cell carcinoma antigen), and ProGRP (progastrin-releasing peptide). At the diagnosis of BM, besides the factors in the lung-GPA and the modified-RPA model, smoking which defined as more than 40 packs per year was also retrieved. From medical records or by telephone, the patients were followed to the end of November 2016. OS was the day BM diagnosed to death for any reason. Considering the sample size from the two centers, the patients from the AMU (group A) were used to train the prognostic model, which was externally validated by the data from the CAMS (group B). Additionally, to make sure the robustness of the RSF method, the group A was resampled and analyzed in the "Supplementary Data" part. Using SPSS software package, the recruited patients were randomly divided into the group SA (n = 115) and SB (n = 27), which was used to train and validate RSF models, respectively. The protocol was approved by the ethics committee at the AMU and the CAMS.
Qiagen formalin fixed paraffin embedded (FFPE) DNA extraction kit was used to extract genomic DNA. Exons 18, 19, 20 and 21 of the extracted DNA were amplified by polymerase chain reaction (PCR) technique, and were analyzed by direct Sanger sequencing 13 . Because NSCLC patients with EGFR exon 20 insertion were not well respond to gefitinib or erlotinib as those with other mutations 14 , EGFR mutation status was analyzed under two classifications, ie EGFR (exon [19][20][21] and EGFR-20 (exon 18, 19 and 21).
Variable Selection. RSF classifier can select prognostic factors by two indicators: minimal depth and variable importance (VIMP). Minimal depth is the node number from the root node to the parent node of the factor located. The smaller the minimal depth of a factor is, the more ability it has on prediction. Furthermore, the mean number of minimal depth distribution of factors is the threshold value for variable selection, and can be used to decide whether a minimal depth of a factor is small enough as a powerful one 15 . VIMP is a comparable measurement of a factor in predicting the response or causal effect 16 , and is decreased with the increase in prediction error if the factor is randomized 10 . Zero or negative VIMP was not predictive 17 , which could be discarded in further analysis. Above all, minimal depth threshold and VIMP could help us to exclude some factors with low prognostic ability.
However, using all remaining factors to develop a prognostic model might result in overfitting, and may describe random error instead of the underlying relationship. Akaike information criterion (AIC) measures the relative quality of statistical models for a given dataset, and a lower value indicates higher quality and lower overfitting possibility. Therefore, it could be used to step-by-step select variables for developing models 18 . Based on the variable selection method, besides AIC, we also introduced another indicator of concordance index (C-index) to guide us developing potentially eligible models with both lower overfitting possibility and higher prognostic ability. C-index is similar to the area under a receiver operating characteristic (ROC) curve, and a higher percentage indicates higher prognostic ability 19 . Above all, a lower AIC and higher C-index of a model had, and the more explanatory and informative of the model was.
The nomogram of the best model was plotted by the Regression Modeling Strategies package (rms). A calibration plot (bootstrap = 1000) of its predictions were plotted against the observed probabilities. An accurate prognostic nomogram has a plot where the observed and predicted probabilities for given groups fall along the 45-degree line 20 . Internal and External Validation of Prognostic Nomogram. Internal validation was used to select the best from the potentially eligible RSF models, which were also compared with current models (modified RPA and Lung-GPA). Besides C-index (discriminatory ability) and AIC (overfitting possibility), these models were also compared by out-of-bag (OOB) error to estimate the generalization error. Among randomly growing RSF trees, about one-third of the cases are not used for training (OOB data), and can be used to unbiasedly estimate the classification error when trees are added to the forest 21 . Therefore, a lower OOB error indicates a better RSF model.
Additionally, a 10 fold cross-validation was also used to internally validate the performance of these models 22 . The validation method randomly divided the original dataset into10 equal sized subsets, and the model is repeatedly trained and validated 10 times. At each time, 9 subsets are pooled to train the model, and then the model is validated in the retained subset. The average error across 10 rounds is the indicator (integrated Brier score, IBS) for generalizing the model in an independent dataset 23 , and a lower IBS indicates a better model. All data was analyzed with the R project (version 3.3.1). The important software packages for the R project included pec, rms, and randomForestSRC. A two sides of p < 0.05 was considered as the significant level.

Ethics approval and informed consent. Our protocol was approved by the ethics committee of the
First Affiliated Hospital of Anhui Medical University and the National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College. The study was conducted in accordance with the relevant guidelines and regulations. Informed consent was obtained from all participants according to the institutional guidelines.
Data availability statement. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Furthermore, in the group A, some characteristics were not balanced between already and developed BM patients, for example, KPS scores, primary tumor status, extracranial metastases, and modified-RPA. Generally, compared to the patients already BM, the developed BM patients tended to have poorer performance status and present extracranial metastases. Although, modified-RPA (x 2 = 14.886, P < 0.01) was significantly different between the already and developed BM patients, lung-GPA (x 2 = 3.443, P > 0.05) was not.
Among all patients (n = 195), 99 individuals received the whole brain radiotherapy by the Varian 6-MV linear accelerators. Additionally, radiotherapy was given to 49 patients for primary tumor or/and metastases (n = 6 for radiotherapy alone and n = 42 for combined treatment). Among those with available information (n = 33), 12 and 21 patients received conventional radiotherapy and intensity modulated radiation therapy (IMRT), and their median doses were 48 Gy (20-66 Gy) and 60 Gy (20-70 Gy), respectively.
In the group A and B, EGFR mutation (Exon 18-21) was detected in 69/142 and 27/53 patients (Fig. 1), who received TKI therapy in 49/69 and 19/27 patients, respectively. Among those accepted TKI therapy, the most (59/61 and 20/25) were after BM both in the group A and B, respectively. Besides those received TKI therapy alone (22/142 and 1/53 in the group A and B), 32, 1 and 6 patients of the group A also accepted chemotherapy, radiotherapy and combined treatment, and in 8, 0, and 16 patients of the group B, respectively.
Treatment modalities of the patients are also presented in Table 1. Between the group A and B, treatment (x 2 = 26.915, P = 0.000) and primary tumor control (19/142 vs. 13/53, x 2 = 50.264, P = 0.000) were significantly different, and the difference of treatment mainly resulted from that less patients in the group A received the combined treatment (19/142 vs. 24/53). In the group A, the treatment modalities were not significant against OS (x 2 = 9.205, p = 0.056); however, TKI therapy (Table 1) was significant against OS (x 2 = 7.287, p = 0.026), which resulted from the significance between TKI therapy and no TKI therapy (x 2 = 6.992, p = 0.008). The Kaplan-Meier analysis indicated that groups were significant factor against OS (x 2 = 6.474, P = 0.011). Other significant factors were already or developed BM, TKI therapy, EGFR (or EGFR-20), Cy211, Ca125 and KPS in the group A, and were already or developed BM and KPS in the group B (Table 1).

Survival and Cox
In the multivariate Cox regression of the group A, EGFR (OR: 0.397, 95% CI: 0.397-0.942) and KPS (OR: 4.444, 95% CI: 2.940-6.717) were significant factors, which were confirmed by Table S2 (Supplementary Data). However, Fig. 2 indicates that C-index (or area under ROC) for the Cox model, modified-RPA, and lung-GPA are not so high enough. Therefore, we tried to build a powerful model by step-by-step RSF.

RSF Models.
The data from the group A were used to train RSF models (results of group SA were in the Supplementary Data). Because the VIMP of EGFR-20 (exon 18, 19 and 21) was obviously higher than EGFR's (exon [18][19][20][21], it was used in constructing RSF models (VIMP: 0.0073 vs. 0.0032). Minimal depth and VIMP of variables are plotted in Fig. 3. Among those variables below the minimal depth threshold (4.6023), nine variables had positive VIMP scores, and were qualified for further analysis (already or developed BM, KPS, Treatment, Ca125, TKI therapy, Cy211, EGFR-20, smoking and gender).     According to AIC and C-index, variables were selected step-by-step (Fig. 4), and three models (KECS, KSE125 and KE125) are notable. The KECS model (KPS, EGFR-20, Cy211 and smoking) was the one strictly identified by AIC, the KE125 model (KPS, EGFR-20 and Ca 125) was a simple one with relatively high C-index, and the KSE125 model (KPS, smoking, EGFR-20 and Ca 125) was the one with the highest C-index (77.2%). Additionally, the KTSCS model (KPS, TKI therapy, EGFR-20, Cy211 and smoking) was not evaluated for relatively lower C-index (71.6%).
Model Evaluation and Validation. The Cox and the 3 RSF models, together with the 2 scoring systems, were separately evaluated in the two groups by C-index, OOB, AIC, and integrated Brier score (Fig. 2).
In the group A, compared with others, the KSE125model had the highest C-index (77.4%), and the lowest OOB and AIC value (25.7% and 725.6). Therefore, the KSE125 model was the best for this cohort. In the 10-fold cross validation of the group A, the patients were randomly divided into 10 parts, and validated the model in each part separately. The performance of the model was evaluated by the integrated Brier score (Fig. 2), which indicated that the KSE125 model (13.2%) was only slightly worse than the Cox model (13.0%) and the KE125 model (13.1%).
In the group B, the 3 RSF models were obviously better than GPA, RPA or Cox model. Although the KECS model's C-index and OOB (77.4% and 28.6%) were the best, its AIC and IBS were worse than those of the KE125 or KSE125 models in sequence. Above all, compared to others, the KSE125 model developed from the group A performed well in the group B, and the model had both higher prognostic ability and lower overfitting possibility.
Additionally, the results of the group SA and SB (Supplementary Data, Figures S1-S3) confirmed that the KSE125 model had both higher prognostic ability and lower overfitting possibility. Furthermore, compared to other models, the KSE125 model performs obviously better in both groups.
The prognostic nomogram for the KSE125 model was built for all recruits (Fig. 5A), and its C-index was 75.6% (95% CI: 66.8-84.4%). The predicted 1-year and 2-year OS of the model agreed well with the corresponding actual OS (Fig. 5B and C), and indicated the good performance of the model.

Discussion
Our study indicates that, for lung adenocarcinoma patients with brain metastasis, a validated prognostic nomogram (KPS, smoking, EGFR-20 and Ca125) can more accurately predict the 1-year and 2-year survival of the patients before TKI therapy than other models.
Many prognostic factors had been related to the survival of NSCLC patients. Besides those in the modified-RPA and the lung-GPA model, the factors also included gene mutations and laboratory indicators 13,24,25 . However, all factors could not be simultaneously included in the prognostic model for overfitting. Therefore, how to use them to develop a model with high predicted ability and low overfitting possibility becomes a problem. In the past, the multivariate Cox regression was popularly applied to select variables. However, as in this cohort, the regression was inferior to RSF in developing prognostic models. Some factors without statistical significance in the Cox model, such as smoking and Ca125, could be integrated into RSF models, and could obviously improve the model's prognostic ability without apparently increasing overfitting possibility. Above all, as a substitute for the Cox regression method, the RSF based step-by-step variable selection method could be used to develop prognostic models for better meeting the requirement of survival prediction.
Furthermore, our data indicated that the variable selection method could be used to develop reliable models. Using the method, we identified 3 RSF models, which were all confirmed to have both higher prognostic ability and lower overfitting possibility (Fig. 2). Although all of these RSF models could be used to predict the prognosis of the patients, as indicated by the indicators, the KSE125 model was slightly better than others. Additionally, both the models integrated CA125 (KSE125 and KE125) were slightly better than the KECS model, and indicated that CA125 was an important prognostic factor for the patients.
It should note that some patient characteristics were not balanced between the groups (or hospitals). For example, at the diagnosis of lung adenocarcinoma in the group A, more patients already had BM and presented extracranial metastases. And that, less patients in the group A received the combined treatment, which might result in a lower local control rate and shorter median OS. However, the selection bias between the hospitals could not obviously weaken the performance of the KSE125 model in the group B.
In this study, the KSE125 model was superior to others; furthermore, its nomogram performed well in discrimination and calibration. Four variables of the models (KPS, smoking, EGFR-20 and CA125) were all reported as factors for lung adenocarcinoma patients previously [26][27][28][29] . Among the factors, KPS which stratified into <70, 70-80 and 90-100 was the key one, and others acted to more accurately correct its prognostic ability. One of such factors was CA-125, which was not in present prognostic models for the patients, but turned out to be a valuable one. However, this did not mean that all the four factors were the most powerfully independent ones in the prediction. As indicated by minimal depth, although the factors of treatment, already or developed BM and TKI therapy were also powerful, C-index of their combination with other factors was not so high in this cohort of patients (Fig. 4). Above all, the combination of the KSE125 model was better than other variables' .
Regardless of the fact that the KSE125 model was developed from the patients who did not receive TKI therapy before BM, it could also be applied in those who received. According to the study from Sperduto et al. 7 , among most patients received TKI therapy before BM, the factor of EGFR was still in the Lung-molGPA model for BM. The prognostic value of EGFR-20 could be explained by that the mutations well responded to TKI therapy 14 , and was classified as a protected predictor on the nomogram (Fig. 4).
Additionally, as indicated by our results and those from Gao et al. 30 , some biomarkers from cancer hallmarks have powerful prognostic ability 31 . Currently, more and more these markers were integrated in prediction models; however, overfitting possibility and generalization ability of the models should be thorough evaluated with sufficient sample size. In this study, because only a part of the patients had the information on Alk and Kras mutational status, to develop reliable model, the markers were not evaluated. Considering the importance of the markers in lung adenocarcinoma patients 7 , their prognostic ability would be studied in our future studies.