Prognostic model of long-term advanced stage (IIIB-IV) EGFR mutated non-small cell lung cancer (NSCLC) survivors using real-life data

There is a lack of useful diagnostic tools to identify EGFR mutated NSCLC patients with long-term survival. This study develops a prognostic model using real world data to assist clinicians to predict survival beyond 24 months. EGFR mutated stage IIIB and IV NSCLC patients diagnosed between January 2009 and December 2017 included in the Spanish Lung Cancer Group (SLCG) thoracic tumor registry. Long-term survival was defined as being alive 24 months after diagnosis. A multivariable prognostic model was carried out using binary logistic regression and internal validation through bootstrapping. A nomogram was developed to facilitate the interpretation and applicability of the model. 505 of the 961 EGFR mutated patients identified in the registry were included, with a median survival of 27.73 months. Factors associated with overall survival longer than 24 months were: being a woman (OR 1.78); absence of the exon 20 insertion mutation (OR 2.77); functional status (ECOG 0–1) (OR 4.92); absence of central nervous system metastases (OR 2.22), absence of liver metastases (OR 1.90) or adrenal involvement (OR 2.35) and low number of metastatic sites (OR 1.22). The model had a good internal validation with a calibration slope equal to 0.781 and discrimination (optimism corrected C-index 0.680). Survival greater than 24 months can be predicted from six pre-treatment clinicopathological variables. The model has a good discrimination ability. We hypothesized that this model could help the selection of the best treatment sequence in EGFR mutation NSCLC patients.


Background
Lung cancer continues to be the leading cause of cancer death, with 20-25% of deaths occurring in non-smoking patients, these usually being the cases with mutations in driver genes such as the Epidermal Growth Factor Receptor (EGFR). In the Asian population, the prevalence of the EGFR mutation is 40-50% however in US and Europe is about 15-20% [1]. In Spain, it is around 16% [2]. Overall survival of patients with EGFR mutations continues to improve due to the appearance of different generations of tyrosine kinase inhibitors (TKIs). Osimertinib in first line has achieved the best progression-free survival (PFS) and overall survival (OS) data in the FLAURA study [3].
There is no standard definition of a long survivor in lung cancer and different cut-off points are found in the literature [4,5]. The median of survival in the first pivotal clinical trials with TKIs is over 24 months [6]. Randomized controlled trials (RCT) are considered the gold standard of evidence-based medicine. However, populations included may not be representative patients in real life considering that in many cases, these are older, have a poorer performance status (PS), rare mutations and with brain metastases detected more frequently at diagnosis [7]. For this reason, patient registry serves as real world data studies to verify the results obtained in RCT rather than carrying out phase IV studies in the clinical setting. These permit the inclusion of large numbers of patients, with longer follow-up than RCTs [8,9].
In 2016, the Spanish Lung Cancer Group (SLCG) began a cooperative epidemiological lung cancer registry, with over 500 participating members from at least 78 hospitals in Spain. This registry is a large database containing over 15,000 cases of lung cancer.
This study aims to identifiy the characteristics present at diagnosis of these EGFR mutated patients that are associated with long survival.

Materials and methods
This is an observational, multicenter; retrospective study that updates prospective follow-up data in the population of EGFR mutated lung cancer patients from Spanish hospitals participating in the SLCG thoracic tumor registry. The registry was approved by the Ethics Committee of Puerta de Hierro University Hospital (Majadahonda, Madrid) (No. PI 148/15) and is registered in the ClinicalTrials.gov database (NCT02941458). The study was carried out in accordance with the Helsinki Declaration.
All patients included had histological confirmation of lung cancer and the presence of an EGFR mutation. Cases were included systematically by hospitals participating in the registry. Data was collected via an online questionnaire with the following sections: a) demographic data; b) smoking history, categorized as never-smokers (< 100 lifetime cigarettes), former smokers or ex-smokers (quit > 1 year prior to diagnosis) or current smokers (continued smoking within 1 year of diagnosis) [10], occupation and family history; c) tumor characteristics at diagnosis, including the specific type of mutation and metastatic sites; d) treatments received, including detailed information on each (start and end dates); e) dates of tumor progression and sites; f) survival data.
After reviewing different publications (real-world data and pivotal studies of the main approved treatments) [11] in addition to the development timeline of our study (between 2009-2017), we consider that patients with advanced lung cancer and EGFR mutation, can be defined as long survivors when the overall survival (OS) is greater than 2 years (> 24 months).

Study design and population
Firstly, all cases with EGFR mutation (any type of EGFR mutation and any stage) were collected from the SLCG thoracic tumor registry (Fig. 1). EGFR mutations were detected using the Cobas EGFR assay, a real time PCR test that identifies mutations in exons 18, 19, 20 and 21. In 20 cases, the specific mutation was not available. Only advanced stages IIIB-IV were included. Patients diagnosed before January 2009 were excluded to avoid older cases that had not had been treated with a TKI at any time during their evolution, as well as those diagnosed after December 2017 to guarantee at least 24 months of follow-up. Patients for whom the dates of last follow-up or death were not available were excluded as their survival could not be calculated. Most of the patients included with EGFR mutations were unable to receive osimertinib in first line as it had recently been approved. Therefore, the long-term survivors included, have received different treatment sequences with 1st and 2nd generation TKIs and chemotherapy.

Statistical analysis
A descriptive analysis of the categorical variables was performed using absolute and relative frequencies and, for the numerical variables, using mean and standard deviation or median and percentiles 25 and 75, according to compliance with the normality assumption. OS was defined as time from diagnosis to death for any cause or end of follow-up (May 2020). Median follow-up was estimated using the reverse Kaplan-Meier method along with the 95% confidence interval [12]. The OS curve was estimated using the Kaplan-Meier method.
A long-term survivor was considered to be any patient still alive at 24 months from time of diagnosis. The univariate analysis to identify factors associated with survival at > 24 months was carried out using binary logistic regression.
Variables collected at diagnosis, before the start of treatment, were used for the multivariable analysis. A multivariable analysis was carried out using binary logistic regression for those variables that were significant in the univariate analysis, as well as others in the literature that have shown an association with survival.
A collinearity diagnosis and an automatic backward elimination (threshold to retain a variable p < 0.10) regression modeling strategy were performed, estimating the odds ratio (OR) and their corresponding 95% confidence intervals (95%CI). A nomogram was developed to improve the interpretation and application of the prognostic model in routine clinical practice [13].
Once the final model was obtained, its internal validity was evaluated via calibration and discrimination. Resampling techniques were performed by bootstrapping, with 500 replications. To measure the calibration, a calibration plot was generated in which the quintiles of the observed and expected event risks were graphically confronted. With perfect calibration, the line between the two risks would lie along the main diagonal of the plot. Discrimination was measured using the C-index, this being an analog of the area under the ROC Curve (AUC ROC), with values ranging from 0.5 for no discrimination to 1.0 for perfect discrimination [14,15].
The level of significance was set at 0.05, except for the exceptions previously described. The statistical package used was the Stata/IC v. 16

Study population characteristics
Of the 861 EGFR mutated cases found in the registry, 516 cases were selected. Of these, 8 were excluded due to the lack of data essential for calculating survival. A further 3 cases were reviewed and excluded due to lack of EGFR mutation, giving a final study population of 505 patients (30 stage IIIB, and 475 stage IV). Figure 1 Flow chart.
218 cases (43.17%) were identified with survival less than or equal to 24 months, while 287 patients (56.83%) had survival greater than 24 months. Table 1 summarizes the characteristics of the population included. Average age at diagnosis was 64 years with 60.8% being women and an ECOG 0-1 in 85.72%. With regard to smoking history, only 54% of cases were never smokers, with adenocarcinoma being the most common histology in 95.45%. As to EGFR mutation type, exon 19 deletion was the most common, 55.25%. The second  At the time of diagnosis, the most frequent metastatic locations were bone in 41.2%, lung in 37.6%, and pleural in 26.9% of cases. The most common sites of single metastatic location were bone, 25.3%, lung, 25.3%, and CNS, 14.8%.

Treatments received
Fifteen patients only received palliative care. The majority of patients who were treated received a TKI in the first line, 76.63% while the other 16.84% received chemotherapy. As a consequence of the inclusion stage IIIB patients, 14 (2.77%) cases received concomitant CT-RT. Four patients received immunotherapy in a clinical trial setting (0.79%).
One hundred and ninety cases (37.62%) did not receive a second line of treatment. In 23.16%, the second line was chemotherapy, while 35.84% received another TKI in second line setting. In second line, the administration of concomitant CT-RT was described in 3 patients, while 14 (2.77%) patients received immunotherapy within a clinical trial. The type of TKI received in each line is summarized in Appendix Table 1. Regarding the number of treatment lines received, 3% of the patients did not receive any treatment, while 34% and 29% received one or two types of therapies, respectively. Less than half of the patients received some line of treatment in a clinical trial setting (40.28%). There were no differences between the survival curves in patients who received first-line treatment with TKIs, chemotherapy, radiotherapy and immunotherapy (log-rank p-value = 0.521).

Overall survival of the whole patient cohort
Median follow-up in our cohort was 42 months (95% CI: 38.5-48.5). Appendix Fig. 1 shows the OS curve for the 505 EGFR mutated patients calculated using the Kaplan-Meier method. Median survival was 27.7 months (95% CI: 24.4-32.8). Using the methodology described by Val Gebski [16], 52% data maturity was observed at 60 months, and 47% at 72 months. Data beyond 72 months was, therefore, considered immature and not interpreted. Applying the criterion of the width of the 95% confidence interval, the results in the Kaplan-Meier curve cannot be considered valid when only 7 subjects remain at risk.

Results of the univariate analysis
The univariate analysis was performed based on survival less than or equal to 24 months vs. greater than 24 months to relate the different socio-demographic variables, as well as those related to the tumor (Table 2) and the type of treatment received (Appendix Table 2).
No significant differences were observed in the age at diagnosis between the patients who survived more than 24 months and those who did not, the mean age in both groups being 64 years. There was a higher percentage of women than men with survival more than 24 months (66% vs. 57%, p = 0.004). Among patients with long-term survival, 93% had a better PS (ECOG 0-1) compared to 75% for those with survival less or equal 24 months (p < 0.001). Weight loss (defined as an unintentional weight loss of more than 5% within a six-month period) was also an important factor for survival, being less frequent in patients who survived more than 24 months than in patients with survived 24 months or less (23 and 33% respectively). No statistical relationship with smoking status and survival was found (p = 0,060) between the two populations. We found a clear relationship with the prognosis according to the type of mutation: the deletion of exon 19 was detected more frequently in cases of long survival (59% vs. 50%, p = 0.039), unlike the L858R mutation (29% vs. 27%, p = 0.586). Insertion of exon 20 was a negative prognostic factor for survival (7% vs. 3%, p = 0.020). The development of the T790M resistance mutation was a favorable factor with both the L858R substitution and the exon 19 deletion (Table 2).
Regarding metastatic sites, the univariate analysis revealed that liver, central nervous system (CNS) and adrenal involvement had the greatest impact on survival. Clear differences were also observed according to the number of metastatic sites, with an average of 2 sites for patients with survival less or equal 24 months vs one site for patients with survival more than 24 months (p < 0.001). Participation in a clinical trial (CT) also occurred more frequently in the population with the longest survival. Among patients with survival over 24 months, 58% received a TKI in first line, 74% in second line, and 82.67% in third line. Increase in survival was significantly related to the number of treatment lines received, with a median of 2 lines [1][2][3][4] in those who survived more than 24 months.

Results of the multivariable analysis
Variables that were significant in the univariate analysis were sex; ECOG; weight loss at diagnosis; presence of exon 19 deletion and exon 20 insertion; appearance of the T790M mutation during treatment; presence of CNS, liver or adrenal metastases; burden of metastatic sites and participation in a clinical trial. In order to prevent selection bias, patients who received no treatment at all (n = 15) and those who received osimertinib as first-line treatment (n = 5) were excluded from the model. Variables related to treatments administered after diagnosis, as well as those acquired during follow-up (such as the appearance of the T790M resistance mutation), were not taken into account in the multivariable analysis as the aim of our prognostic model is to understand the probability of survival at time of diagnosis.
After verifying no collinearity (Appendix Table 4) between the independent variables, the maximum model contained the following variables: sex; age; smoking status; TNM staging system; ECOG; weight loss; exon 19 deletion; exon 20 insertion; liver, adrenal and CNS metastases; and total number of metastatic sites. Applying the 1:10 empirical rule [14] when evaluating the number of independent variables to include in the model, up to 21 variables could be entered. An automatic backward regression modeling strategy was performed, eliminating from the maximum model those variables with a significance level p > 0. 10. In the final model, the variables shown in Table 3 remained, with their corresponding OR (95% CI). For internal validation, 500 re-samplings were carried out by bootstrapping and the number of times each variable is selected are shown (Appendix Table 3).
The discrimination ability of the model is good, with a C-index equal to 0.711 (95% CI 0.665-0.757). The Cindex of the final model optimism-adjusted for bootstrap * Patients do not add up to the exact total (n 505) for some variables due to lack of data ** Patients may add up to more than the total (n 505) since a patient may have one or more mutations or metastatic sites is equal to 0.680 (CI 95% 0.627-0.726). The calibration plot (Fig. 2) shows how the calibration line lies almost perfectly along the main diagonal (perfect calibration) except at the ends, which indicates the low number of events and no events at the prediction extremes [13].
Applying the optimism-adjustment for bootstrap, the slope of the calibration curve is equal to 0.781. To facilitate the interpretation and weight of each variable, a nomogram (Fig. 3) was developed. The nomogram provides prognostic information for OS greater than 24 months in a stage IIIB-IV patient with an EGFR mutation. Each of the variables has an associated score, which give a final total score when added together. Drawing a perpendicular line upward from the axis of the overall score gives the model's predicted probability at the time of diagnosis for this patient to achieve survival greater than 24 months.

Discussion
After studying the EGFR mutated population of real life patients, we can state that the OS data in our cohort are comparable with other published real world data and have an adequate follow-up [17,18]. In contrast with other articles reported, our cohort has fewer never smoker patients (54%). Although our data come from daily clinical practice, most patients (85.72%) had a good  ECOG 0-1 and a low median number of metastatic sites at diagnosis [1,2], which may be linked to increasingly early diagnoses with fewer symptoms. Exon 19 deletion is the most frequent mutation in our population, along with a lower frequency of the L858R mutation compared to that published in other series [19]. The most common metastatic site at diagnosis was bone, followed by lung and pleural involvement. The now-standard practice of administering a TKI in first line occurred in our cohort in 76.7% of cases. Given that clinical studies exclude patients with certain characteristics that could indicate worse prognosis, such as brain involvement or poor ECOG, we believe it is important to evaluate real life data that can have a complementary role in determining the efficacy and safety of treatments. Our study does have several limitations, such as its retrospective nature, with data included by different people and a potential inconsistency or lack of data.
There is also a potential selection bias in terms of inclusion of patients in the registry since only 15 patients did not receive any line of treatment, a percentage that is probably below that of in clinical practice, therefore we excluded this patients from the nomogram. Although the best prognosis has been described in never smokers, in our study, smoking history was not related to survival or prognosis [20]. However, it is true that our population contained fewer never smoker patients. Florescu M. et al. [21] extracted 10 factors related to prognosis from the BR21 study with erlotinib, among them, ECOG, weight loss and smoking. In our study, although data about weight loss was available for few patients, it was a significant factor in the univariate analysis. Participation in a clinical trial (CT) was associated with survival however it is probably subject to confounding factors since these patients have better PS. In line with different data in the literature, the presence of an exon 19 deletion yielded better survival data compared to the L858R mutation in exon 21, as well as the emergence of resistance mutation T790M [22] associated with either of the two sensitive mutations (del19 and L858R). In our univariate analysis, we found a significant association with metastatic spread to the liver, CNS and adrenal glands, conferring worse survival prognosis. Various studies clearly demonstrate worse prognosis with liver involvement and the worst ECOG is related to adrenal involvement [23]. In other studies, bone involvement is related to prognosis in patients with an EGFR mutation who receive an TKI [24]. However, we did not find this last association.
Our study demonstrates that a certain combination of variables at diagnosis can help predict prognosis, since we found six variables associated with a higher probability of survival greater than 24 months. We did not include variables either relating to or appearing during the course of treatment, such as the T790M resistance mutation, since we were looking for a tool that could be used at diagnosis to provide personalized information on survival probability for the patient and their oncologist.
Different publications have found an association between female sex and better prognosis in lung cancer independent of EGFR mutation status. One SLCG publication describes a median OS of 32 months for women and 19 months for men with EGFR mutations, while in non-mutated patients the median OS remained higher in women, 19 vs. 12 months [25]. It is well known that ECOG is related to prognosis. This, along with sex, is the variable most represented in prognostic nomograms, highlighting the importance of its weight. There are few nomograms that predict survival in lung cancer but ECOG is one of the variables in the model developed by Keam B. et al. in EGFR mutated patients to predict progression-free survival (PFS) in those receiving a TKI [24]. The presence of exon 20 insertion [26] confers worse survival compared to common sensitivity mutations (del19 and L858R) without response to currently available TKIs and clear implications for prognosis. In our final model, exon 20 insertion was maintained as a predictive variable of prognosis, unlike exon 19 deletion that loses its level of significance. It is known that a greater number of metastatic sites confers worse prognosis [27]. In one publication on EGFR mutated patients, ECOG ≥2, intra-and extra-thoracic metastases, a greater number of metastatic sites, adrenal and liver metastases, and rapid progression at diagnosis were associated with PFS and OS in the univariate analysis [21]. In the multivariable analysis, only ECOG and rapid tumor progression were still associated with worse PFS. Various studies show the predisposition for brain involvement in EGFR mutated adenocarcinoma [28], this being present at diagnosis in 25% of patients and developing at 3 years in approximately 50% of cases [29]. Given that there are TKIs such as osimertinib that can cross the blood-brain barrier (BBB) [30], this provides a better survival to EGFR mutated patients comparing to wild type.
However, brain involvement is clearly a negative factor that always affects survival [31]. Dissemination to the adrenal glands was maintained as a predictive variable of survival in our model as well as liver involvement which it is well-known its relationship with bad prognosis.
In the literature, it has been described how adrenal involvement confers worse prognosis in both wild type and EGFR mutated lung cancer [27]. A review of 409 patients (not selected for EGFR mutation) found a statistically significant relationship with the presence of intraabdominal metastases (with particularly poor prognosis due to the presence of adrenal, p = 0.011, liver metastases, p < 0.001 and intra-abdominal adenopathy, p = 0.014) [23]. Similarly, another publication on the TNM staging system described the worst survival with adrenal involvement, independently of EGFR mutation status [32]. It is known that liver involvement is associated with worse prognosis in lung cancer and therefore it is one of the variables present in our nomogram.
We know that the population with the highest probability of survival extracted from the Spanish Registry of Thoracic Tumors of the GECP has the following characteristics: being a woman, absence of exon 20 insertion, absence of brain metastases, absence of liver and adrenal metastases, fewer metastatic locations, and a better functional status 0-1. Given that this group of patients would have a better prognosis at the beginning and a lower hazard of death, it is in this group that we could carry out a study to sequence different TKIs (1st, 2nd and 3rd generation) and chemotherapy. There are various liquid biopsy studies [33] that have observed that, depending on the allelic frequency of the mutation (MAF), it is possible to identify low-risk patients who could be candidates for sequential treatment. In comparison, patients at a higher risk of death would be those who would benefit the most from starting first-line osimertinib therapy. Therefore, we would be able to combine the variables selected by the nomogram with the molecular determination of the allelic fraction prior to the start of treatment, in order to identify those patients with a lower risk of progression in which it would be possible to sequence the treatments.

Conclusions
With these six variables -sex, ECOG, exon 20 insertion, presence of CNS, liver or adrenal metastases and number of metastatic siteswe have constructed a prognostic nomogram (Fig. 3) with good calibration and discrimination to predict long survival in patients with an EGFR mutation. No other publication exists that only uses patient variables at diagnosis without the therapy initiated in patients with an EGFR mutation. Therefore, this is the first prognostic model that can help predict the probability of long survival before starting therapy.
Given that in our registry almost no patients were able to receive osimertinib in first line (except for 5 patients in a clinical trial) due to its recent approval for that indication, those who survived more than 24 months received first or second generation EGFR TKI and chemotherapy sequences. Therefore, we could select patients for those variables present at diagnosis like being a woman with a good ECOG 0-1, without exon 20 insertion or adrenal, liver or brain metastases and with a low burden of metastatic sites. We believe this population would be the candidate for sequential treatment with different TKIs and chemotherapy to achieve long survival.
Of course, this would need to be tested in randomized clinical trials and it could be interesting to stratify the population based on these variables to randomize treatment sequences with TKIs and chemotherapy vs. osimertinib in first line. It is possible that adding the study of liquid biopsies with quantification of the allelic fraction of the sensitivity mutations and the appearance of the T790M resistance mutation can further improve patient selection to determine the best treatment sequence.
This nomogram could assist clinicians in their daily practice and could be useful to design future clinical trials.