Simplified molecular classification of lung adenocarcinomas based on EGFR, KRAS, and TP53 mutations

Background Gene expression profiling has consistently identified three molecular subtypes of lung adenocarcinoma that have prognostic implications. To facilitate stratification of patients with this disease into similar molecular subtypes, we developed and validated a simple, mutually exclusive classification. Methods Mutational status of EGFR, KRAS, and TP53 was used to define seven mutually exclusive molecular subtypes. A development cohort of 283 cytology specimens of lung adenocarcinoma was used to evaluate the associations between the proposed classification and clinicopathologic variables including demographic characteristics, smoking history, fluorescence in situ hybridization and molecular results. For validation and prognostic assessment, 63 of the 283 cytology specimens with available survival data were combined with a separate cohort of 428 surgical pathology specimens of lung adenocarcinoma. Results The proposed classification yielded significant associations between these molecular subtypes and clinical and prognostic features. We found better overall survival in patients who underwent surgery and had tumors enriched for EGFR mutations. Worse overall survival was associated with older age, stage IV disease, and tumors with co-mutations in KRAS and TP53. Interestingly, neither chemotherapy nor radiation therapy showed benefit to overall survival. Conclusions The mutational status of EGFR, KRAS, and TP53 can be used to easily classify lung adenocarcinoma patients into seven subtypes that show a relationship with prognosis, especially in patients who underwent surgery, and these subtypes are similar to classifications based on more complex genomic methods reported previously.

Tumors with acinar, papillary, or lepidic histomorphology and mutations or copy number alterations in EGFR, presenting most often in women who have never smoked, predominantly cluster in the terminal respiratory unit subtype. Tumors in the proximal-proliferative subtype have variable histology and commonly display mutations and copy number alterations in KRAS and STK11. In contrast, lung adenocarcinomas with primarily solid architecture and enrichment for TP53 and NF1 mutations and p16 methylation typically cluster in the proximalinflammatory subtype [8,15].
While molecular subtypes of lung adenocarcinoma have been associated with significant differences in prognosis, routine gene expression profiling in the clinical setting has been limited by cost, complexity, and increased turnaround time [16]. These limitations have led to the development of simplified prognostic models based on the expression of selected genes [9,16]. However, many of these genes, such as PTK7, CIT, SCNN1A, PTGES, ERO1A, ZWINT, DUSP6, MMD, STAT1, ERBB3, and LCK, are not tested routinely in the clinical laboratory.
To fill this need, we developed a simplified molecular subtype classification based on the mutational status of only EGFR, KRAS, and TP53 to facilitate categorization of patients' lung adenocarcinomas into molecular subtypes with relevant prognostic information.

Patient selection for development cohort
We retrospectively reviewed our institutional database for patients treated between May 1, 2010, and October 31, 2015, to identify cytologic specimens of patients with lung adenocarcinoma. Patients with TTF1-negative non-small cell lung cancer, small cell lung cancer, large cell carcinoma, squamous carcinoma, and poorly differentiated carcinoma not otherwise specified were excluded. We reviewed the patients' medical records for demographic characteristics, clinical information, fluorescence in situ hybridization (FISH) results for ALK, ROS1, MET, and/or RET, and mutation profiling data derived by next-generation sequencing (NGS) and polymerase chain reaction (PCR)-based methods (i.e. Sanger sequencing or pyrosequencing). PCR-based methods were restricted to analysis of only EGFR, KRAS, and BRAF hotspots (Supplement Data).

Patient selection for validation cohort
Patients from our institution's Genomic Marker-Guided Therapy Initiative (GEMINI) project database were selected as a validation cohort. This group included patients who underwent computerized tomography-guided transthoracic core-needle biopsy for diagnosis and/or staging of lung adenocarcinomas as well as patients who underwent surgery to resect lung adenocarcinoma between November 1, 2009, and October 31, 2016. Age, sex, race/ ethnicity, smoking status, NGS mutation data, survival status, and treatment information were included in the analysis. To avoid Simpson's paradox [17], we combined this cohort with a subset of cytology cases from the development cohort whose medical record numbers matched to those of records in the GEMINI database and who had available survival information and treatment data.

Mutational analysis
NGS was performed on cytology smears or formalinfixed paraffin-embedded tissue (cytology cell blocks or core biopsy tissue blocks) using the Ion Torrent or Ion Proton (Thermo Fisher Scientific) sequencers in our College of American Pathologists-accredited, Clinical Laboratory Improvement Amendments-certified laboratory. Multiple NGS panels were developed, validated, and implemented in our laboratory during the study period (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016), including an initial hotspot panel of 46 cancer-related genes [18], an updated 50-gene hotspot panel, a 126-gene panel, and a panel of 409 cancerassociated genes [19]. The cytology specimens were appropriately validated [20]. All these panels include several amplicons targeting known hotspots in exons of EGFR, KRAS, and TP53.

Simplified classification of molecular subtypes
We stratified cases from our development and validation cohorts by creating a classification system using the mutational status of EGFR, KRAS, and TP53, forming mutually exclusive groups. Cases that harbored mutations in EGFR only or mutations in EGFR and genes other than KRAS and TP53 were classified as the simplified terminal respiratory unit (sTRU) subtype. Cases with KRAS mutations only or mutations in KRAS and genes other than EGFR and TP53 were classified as the simplified proximal-proliferative (sPP) subtype. Cases with only TP53 mutations or mutations in TP53 and genes other than EGFR and KRAS were classified as the simplified proximal-inflammatory (sPI) subtype. Also, cases with co-mutations in KRAS and TP53 (KRAS/TP53 subtype) or EGFR and TP53 (EGFR/TP53 subtype) were grouped separately. Cases with mutations in genes other than EGFR, KRAS, and TP53 were classified as the non-TRUPPPI subtype, and a few cases that lacked mutations in any of the genes detected by our NGS panels were placed in a "no-mutation" subtype.

Development cohort
Categorical variables were summarized by frequencies and percentages, and continuous variables were summarized using means, standard deviations, medians, and ranges. Fisher exact test or its generalization for categorical variables was used to compare categorical variables between one molecular subtype and the remaining patients; in addition, Monte Carlo simulation approach was used when computational issues were encountered.
Patients with indeterminate FISH results or unknown aneuploidy status were excluded from the Fisher exact tests. The Wilcoxon rank sum test or Kruskal-Wallis rank-sum test was used to compare continuous variables between molecular subtypes.

Validation cohort
Associations between variables and subtypes were assessed as described for the development cohort. The outcome variable of overall survival (OS) time was computed from the date of initial diagnosis to the last follow-up date or death date. For the subset of patients who had surgery, separate analyses were performed of OS from the date of surgery. Cox proportional hazards models were used to evaluate associations of variables with survival outcomes, and Firth penalized Cox regression models were fitted for covariates with zero count of events. In multivariable Cox regression analyses, we included covariates that had p values less than 0.25 in univariate Cox regression models. Treatment variables (surgery, radiation, and chemotherapy) were handled as time-varying covariates. The Kaplan-Meier method was used to estimate survival distributions, and the log-rank test was used for comparisons between survival distributions. All statistical analyses were performed using R version 3.3.11 [21] and SAS version 9.4. All statistical tests used a significance level of 5%, and no adjustments for multiple testing were made.

Validation cohort
To validate these findings and determine the impact of our subtypes on prognosis, we used a validation cohort (n = 428) composed of core-needle biopsy samples or resection specimens from lung adenocarcinoma patients with available data on treatment and follow-up. Histomorphologic subtypes (e.g., mucinous, lepidic, acinar, and solid) were reported in 28.3% (n = 121) of the pathology reports. The mutational data for this cohort were based only on NGS because all three target genes were not assessed in cases where PCR-based single-gene analysis was performed. Also, we included the 63 patients from the cytology cohort in the GEMINI database with treatment and follow-up data available. NGS results were available for 85.7% (n = 54) of these cases. Clinical and histomorphologic associations according to simplified molecular subtypes

Mutational profiling of lung adenocarcinoma patients in the validation cohort
The simplified molecular subtypes were significantly associated with age, race/ethnicity, sex, smoking status, stage, histomorphology and FISH results. Table 3 summarizes the associations between molecular subtypes and clinicopathologic variables in the overall validation cohort and each particular subtype. As in the development cohort, variables were compared between patients within a given molecular subtype and the remaining patients.     Prognostic associations according to simplified molecular subtype classification We assessed overall survival in the validation cohort as previously described. The median follow-up time was 1.87 years (interquartile range: 0.9-3.5 years). The median survival time was 5.93 (95% CI: 4.57-not reached) years. We fitted a multivariable Cox proportional hazards regression model to assess for associations between OS and the covariates of age, sex, alcohol intake, stage, molecular subtype, and treatment (surgery, radiation, and/or chemotherapy), selected on the basis of univariate analyses with a cutoff p value of 0.25. We observed that patients with older age (hazard ratio [HR] = 1.03, 95% CI: 1.01-1.05) and stage IV disease (HR = 6.51, 95% CI: 2.49-17.04, p = 0.006) had worse OS. Although the difference was not statistically significant, patients in the sTRU subtype had better OS (HR = 0.42, 95% CI: 0.18-1.00, p = 0.051) whereas patients in the KRAS/ TP53 subtype had worse OS (HR = 2.15, 95% CI: 1.02-4.53, p = 0.043) than those in the other subtypes ( Fig. 2a  and b). Patients who underwent surgical resection (HR = 0.33, 95%CI: 0.18-0.60, p < 0.001) had better OS than those who did not have surgery. Figure 3a shows significant differences in OS within this subset of patients. We observed statistically significant differences in OS between the sTRU, sPP, and sPI subtypes (Fig. 3b), however, differences between these subtypes when categorized in early stages I and II or late stages III and IV did not reach statistical significance (not shown). Interestingly, no significant differences in OS were observed in patients who received chemotherapy, and OS was significantly worse in those who received radiation therapy, regardless of molecular subtype (HR = 1.87, 95% CI: 1.18-2.96, p = 0.007). Notably, OS did not significantly differ between the sTRU and EGFR/TP53 subtypes in all patients (p = 0.84; not shown) or in the patients with early stage tumors (i.e. stages I and II; p = 0.16; not shown) who underwent surgery. Statistically significant OS difference between these two subtypes was only observed in late-stage tumors (i.e. stages III and IV; p = 0.024; Fig.  3c). Conversely, compared to the sPP subtype, the KRAS/  Fig. 2b) and among those who underwent surgery (p = 0.013; Fig. 3d) regardless of stage.

Discussion
In this study, we show that the mutational status of three commonly mutated genes can be utilized to create a simplified, mutually exclusive molecular subtype classification of lung adenocarcinomas based on molecular subtypes previously identified using gene expression profiling or larger gene mutation panels and that this simplified classification shows a relationship with prognosis, especially in patients who have undergone surgery.
The simplified classification showed high concordance with most previously reported associations, but there were some notable differences. For example, most patients in the sTRU subtype had advanced-stage disease. Among advanced-stage cases, however, those with sTRU subtype had a better prognosis, perhaps a reflection of our referral patient population, whose disease often has not responded to first-line therapy and who present with high-grade, advanced-stage tumors. As expected, the sTRU subtype was also enriched for Asian patients with better prognosis and lepidic histology [8,12,22]. Our results also suggest that adenocarcinoma histologic types do not correlate with stage, in keeping with previous findings [12]. However, we observed significant associations between some of the simplified molecular subtypes and morphology. As suggested by Nakaoku et al. and others [23,24], the PP subtype as well as our sPP subtype are associated with mucinous histology. While the sTRU subtype was not associated with lepidic histology as the TRU is [25], the sTRU however, was associated with non-mucinous tumors. KRAS/TP53-mutated tumors more often had solid histology; similarly, a study by Rekhtman et al. [24] found a significant association between a subset of KRAS-mutated tumors and solid histology; however, they did not test for TP53 mutations. Our observations suggest that this association could be unique to KRAS/TP53 co-mutated tumors.
Interestingly, acinar histology was common in tumors that did not show mutations in this study. Since our NGS panels were developed to target specific exons and did not provide whole-exome/genome results, the no-mutation group could harbor infrequent intronic or exonic mutations/variants in EGFR, KRAS, TP53, and/or other genes. Whereas genomic alterations have been found in all tumors tested by various groups [26,27], tumors with rare alterations of currently unknown significance could represent unique subtypes where oncogenesis is not driven by common mutations in known genes and genetic pathways [28].
The prognostic difference between the sTRU and EGFR/TP53 subtypes was only observed in late-stage tumors in our cohort, in keeping with prior findings [29] that have been validated in a recent metaanalysis [30]. Contrariwise, the prognostic difference between the sPP and KRAS/TP53 subtypes was statistically significant across all groups regardless of tumor stage. We would like to emphasize the distinction of the KRAS/TP53 molecular subtype because it appears to confer the poorest OS. A report from our group demonstrated different subtypes within KRASmutated cases and further supports that KRAS/TP53 co-mutation portends the worst prognosis [31]. Furthermore, recognition of this simplified molecular subtype is supported by recent data showing a potential predictive value of the KRAS/TP53 subtype for response to PD-1 blockade immunotherapy [32].
Our findings concord and confirm with previous findings that radiation [33] and chemoradiotherapy carry a worse OS with more toxicity and a higher rate of death during treatment, particularly in older patients [34]. Our work thus builds upon recent evidence suggesting radiation therapy be reconsidered in patients with lung adenocarcinoma. In keeping with the recent recommendations by the updated molecular testing guidelines for the selection of lung cancer patients for targeted therapy, our results provide additional support for the use of cytology specimens as a valuable sample source for molecular testing in patients with lung adenocarcinoma. NGS further enables testing of FNA material and helps avoid the potential risks associated with surgical biopsies [35][36][37][38][39][40][41][42].
Others have shown that gene expression profiling using microarray technology reliably estimates prognosis [9,16], but the use of microarrays in the clinical setting is limited by the large number of analyzed genes, complex methods, independent validation of the results, low inter-laboratory reproducibility, high cost, long turnaround time, and the need for fresh or frozen tissue [43]. By creating mutually exclusive groups based on easily accessible data, such as EGFR, KRAS, and TP53 status, the classification of lung adenocarcinomas into prognostic molecular subtypes could become readily available in routine clinical practice. While oversimplification is a potential limitation of the classification proposed, we believe this simplified classification provides useful prognostic information while retaining the updated proposed nomenclature (i.e., TRU, PP, and PI). This simplified approach will make it easier for molecular genetics laboratories and clinicians to accurately classify patients and will help maintain consistency across different molecular laboratories employing NGS platforms for genomic analysis. Because of the increasing demand for multigene testing over single-gene tests [42] and because most, available NGS panels testing lung adenocarcinoma samples contain these three key genes, we suggest that this simplified classification be used primarily for results obtained via NGS.

Conclusion
In summary, using mutational data for EGFR, KRAS, and TP53, we have defined prognostic groups similar to those previously identified by more complex genomic methods in patients with lung adenocarcinomas.