Prognostic performance of TNM8 staging rules in oral cavity squamous cell carcinoma

Background: Two major changes to the staging of oral cavity squamous cell carcinoma (OCSCC) were adopted in TNM8: (1) depth of invasion is now used for T staging and (2) extranodal extension for N staging. The aim of this study was to evaluate if TNM8 stratifies OCSCC patients more accurately than TNM7 based on overall survival (OS) statistics and hazard discrimination. Methods: Retrospective study of 297 patients with OCSCC who underwent surgery at our institution. Clinical and pathological data were previously populated from review of medical charts and histological reports. Slides were re-reviewed for depth of invasion measurements. Patients were staged using both TNM7 and TNM8 with overall survival statistics analysed. Results: Overall 118 patients (39.7%) were upstaged using TNM8. Both TNM7 and TNM8 stage categories were highly significant for OS (all p values < 0.0001). Hazard discrimination analysis showed that TNM7 could only differentiate stage III from stage IV disease with significance (OS p = 0.01). In comparison TNM8 could distinguish between stage II and III disease (OS p = 0.047) and between stage III and IV disease (OS p = 0.004). Subsite analysis suggested that both editions of the staging system perform best for tongue primaries. Conclusions: Although TNM8 showed improved hazard discrimination in comparison to TNM7, problems with discriminative ability persisted with 8th edition staging criteria. Large scale validation studies will be required to direct future refinement of the staging rules and to establish if the continued use of a single staging system for all oral cavity subsites is appropriate.


Introduction
The goals of cancer staging systems include the categorization of patients with similar prognosis, which may in turn inform treatment planning, comparison of outcomes, and research. Key components of an ideal cancer staging system include hazard discrimination, whereby each staging subgroup should have different survival to the group above and below, and hazard consistency, meaning that patients within the same subgroup should have similar survival [1]. In an attempt to improve the hazard discrimination and hazard consistency of oral cancer staging, the American Joint Committee on Cancer Control (AJCC) have made two significant changes to the most recent 8th edition staging. These changes include the incorporation of depth of invasion (DOI) into the T-category, and extranodal extension (ENE) into the N-category [2].
There is a good body of evidence to support these changes in TNM classification. Numerous studies have shown DOI to be a significant patients. The final step was to analyse the interplay between the proposed T and N categories by examining the stage groupings. When the 7th edition stage groupings were applied to the proposed 8th edition T and N criteria it was not possible to discriminate between stage II and stage III disease based on overall survival statistics. Re-analysis following adjustment of the stage groupings yielded improved hazard discrimination. However, the adjusted stage groupings could not be validated using cancer registry data due to lack of availability of DOI measurements and ENE status in the National Cancer Database. Therefore, although institutional data (MSKCC-PMH dataset) supported amendment of the stage groupings, the AJCC elected to leave them unchanged pending future validation [2].
The aim of this study was to evaluate if TNM8 stratifies OCSCC patients more accurately than TNM7 based on overall survival statistics and hazard discrimination.

Patients and methods
This was a retrospective study of 297 patients with primary OCSCC who underwent definitive surgical treatment at the South Infirmary Victoria Hospital between 2000 and 2016. Patients were identified from a pre-existing database. Patients with recurrent cancers, second primary Head and Neck cancers, synchronous primary cancers, or who had undergone previous neck irradiation, were excluded. Ethical approval for the study was obtained from the Cork Clinical Research Ethics Committee. Clinical and pathological data including ENE status were previously populated from review of medical charts and histological reports. In cases where ENE was not recorded in the original pathology report, original pathology slides were reviewed for determination of same. Original pathology slides were re-reviewed for DOI measurements by 2 pathologists at a mulitheaded microscope and the consensus DOI measurement utilised for 8th edition staging. DOI was measured by dropping a plumbline from the basement membrane of adjacent intact squamous mucosa to the deepest point of tumour invasion [2]. All patients were then re-staged using both TNM7 and TNM8, according to the data in the final study database.
Survival was calculated from the time of surgery to the time of death or last follow up in clinic. Patients dying with recurrence or otherwise uncontrolled cancer were considered to have died from disease. Patients dying from medical complications in the first month after surgery were also considered to have died due to cancer. Statistical analysis was performed using XLSTAT (Addinsoft). Survival curves were analysed using Kaplan-Meier method and Log-Rank test. Hazard ratios were calculated using Cox proportional hazards modelling.

Results
The study cohort comprised of 297 patients (199 males). Clinicopathological and demographic features of the study population are shown in Table 1. Tongue (39%), and floor of mouth (FOM) (28%) were the most frequent subsites.
Within the T-classification, 114 (38.4%) patients were upstaged, with 70 moving from T1 to T2, 32 from T2 to T3 and 12 from T3 to T4. No patient was downstaged. Within the N-classification, 47 of 101 node positive patients (46.5%) were upstaged with 8 migrating from N1 to N2a, and 39 migrating from N2b and N2c to N3b. When the stage groupings were applied, 118 patients (39.7%) were upstaged using TNM8. No patient was downstaged. The largest migration (51 patients) was from stage I to II. Of note, using TNM7, no patient had stage IVB disease, which at that time required a T4b primary tumour or a nodal metastasis > 6 cm in size. TNM8 resulted in the migration of 39 cases (13.1% of the overall cohort) into stage IVB due the presence of ENE. See Table 2 for stage re-distribution.
Mean and median follow-up were 45 and 33 months respectively. 149 patients died, of whom 82 died from cancer. 7 additional patients who died within the first postoperative month from medical complications were considered to have died from disease.
Both TNM7 and TNM8 staging were highly predictive of disease specific survival (DSS) and OS on Kaplan-Meier analysis (all p values < 0.0001) ( Table 3 and Fig. 1).
When hazard discrimination was analysed TNM7 could distinguish between stage III and stage IV disease based on both OS and DSS (p = 0.01 and p = 0.001, respectively), but could not discriminate between other contiguous stage groupings. TNM8 could also distinguish between stage III and IVA/IVB disease (OS p = 0.004, DSS p = 0.003), but could not discriminate between stage III and IVA and neither could it discriminate between stage I and II. However, TNM8 could differentiate between stage II and III disease based on OS, but not DSS (p = 0.047 and 0.15 respectively).
In contrast to our findings, the AJCC validation study of TNM8 failed to discriminate between stage II and III disease based on OS, but re-analysis following adjustment of the stage groupings generated improved hazard discrimination. Of note in our cohort the OS p-value of 0.047 for stage II versus III disease only just reached significance. Therefore, similar to the AJCC, we adjusted the stage groupings in an attempt to improve discriminative ability between stage II and III. Two approaches were utilised: (1) patients with T3N0 disease were moved from stage III to stage II; and (2) patients with T1N1 and T2N1 disease were moved from stage III to stage II. The data were then re-analysed. When T3N0 cases were re-categorised as stage II, the p-value for stage II versus III disease based on OS became non-significant (p = 0.43). In contrast when T1N1 and T2N1 cases were reclassified as stage II, the pvalues for stage II versus III disease for OS and DSS were 0.0001 and 0.001 respectively. However, discriminative ability for stage III versus IV disease was then lost (OS p = 0.30, DSS p = 0.28) ( Table 3 and Fig. 2). Some aspects of the stage grouping survival results were reflected in sub-analysis of the T and N categories. TNM8 could not differentiate between T1 and T2 disease based on OS or DSS mirroring the inability to discriminate stage I and II disease, whereas TNM7 could discriminate between T1 and T2 with respect to DSS, but not OS. Overall survival statistics were significant for T2 versus T3 cases using both systems, but not for T3 versus T4. Within the N categories TNM8 could not discriminate N1 from N2 disease. OS hazard discrimination was significant for all other nodal categories using both staging editions (Table 4).
Finally, a subanalysis was undertaken to ascertain if the staging system performs differentially depending on oral cavity primary subsite. Tongue and non-tongue subsites were analysed separately. For tongue primaries, OS of T1 versus T2 cases could be separated with statistical significance using TNM8 (HR 3.48, 95% CI 1.20, 10.12; p = 0.02) and was just outside significance for TNM7 ( Fig. 3).

Discussion
The 8th edition of the AJCC staging manual for oral cavity cancer represents a significant advance over previous versions, incorporating as it does for the first time DOI of the primary tumour into T staging,  and ENE into N staging. Both of these parameters have been shown through an abundance of data to be significant prognosticators in OCSCC. However, although TNM8 has been shown to be an improvement over the previous edition, there is still a lack of data regarding the stage groupings. In the initial validation study undertaken by the AJCC, it was found that the 7th edition stage groupings applied to the 8th edition T and N categories could not discriminate between stage II and III disease. However, the stage groupings were left unaltered pending additional validation data [2]. A further issue that has complicated the introduction of the 8th edition staging is that since its initial publication significant corrections and updates have been issued. The most important of these were (1) to correct the erroneous upstaging of tumours ≤2 cm in diameter and with DOI > 10 mm from T1 to T3, and (2) to incorporate DOI criteria into the T4 category, which had originally been omitted [15]. Consequently, the majority of groups to date who have published on the prognostic ability of the AJCC 8th edition have used incorrect versions of the staging system and therefore the results of these early studies must be interpreted with caution [16].
In a more recently published study, Sridharan et al applied the corrected 8th edition staging criteria to a cohort of 494 patients with early stage oral tongue SCC (tumours ≤ 4 cm and pathologically node negative). Overall 37.9% of patients were upstaged, with 34.5% upstaged from pT2 (stage II) to pT3 (stage III). However, the latter was not associated with improved local or locoregional control supporting the AJCC's conclusion that the categories of stage II and III disease may require adjustment in future iterations of the staging system [16].
Amit et al also evaluated the prognostic ability of the corrected version of TNM8 in early tongue cancer. They restricted their cohort to 7th edition T1 and T2 tumours with both node negative and positive disease. 25% of the 244 patients in this study were upstaged. Overall stage using TNM8 correlated significantly with both OS and DSS on multivariate analysis, but TNM7 did not. The 8th edition also showed better hazard discrimination compared with the 7th edition including between stage II and III disease. When the T and N categories were evaluated separately the only significant survival difference was seen in patients upstaged from T2 to T3 disease, suggesting that the improved performance of TNM8 compared to TNM7 was due to the effect of DOI. The authors postulate that ENE may have lacked impact in their cohort due to the low rate of nodal disease in early stage tongue cancer and/or the modifying effect of adjuvant chemotherapy [17].
Garcia et al took a different approach. They restricted their analysis to cases with nodal metastases in a retrospective study of 1188 patient with Head and Neck SCC who underwent neck dissection, of whom 270 (23.8%) had oral cavity primaries. Overall 50.1% were node positive, of whom 50.5% were upstaged due to the presence of ENE, with 20.9% of patients upstaged from pN1 to pN2a and 58.4% from pN2 to pN3b. The 8th edition pN categories showed improved hazard discrimination compared to the 7th edition [18].
In contrast to the studies above we placed no restrictions on either tumour stage or oral cavity subsite for eligibility in our cohort of 297 patients. Both TNM7 and TNM8 highly correlated with DSS and OS. TNM8 showed improved hazard discrimination in comparison to TNM7. In contrast to the AJCC we found that the 8th edition could differentiate between stage II and III disease based on OS, but not DSS (p = 0.047 and 0.15 respectively). However, issues with discriminative ability persisted with the 8th edition including failure to distinguish

Table 4
Hazard discrimination (Hazard ratios with 95% confidence interval, and p-value for log-rank test) for contiguous T and N categories, TNM7 and TNM8. between stage I and II disease and between stage III and IVA disease. Furthermore, our attempts to improve hazard discrimination by adjusting the stage groupings were unsuccessful. The latter finding is unsurprising when sub-analysis of T and N categories is considered. We were unable to replicate the findings of the AJCC validation study which demonstrated significant survival differences between all contiguous T and N categories [2]. Instead we found that TNM8 could not distinguish T1 and T2 disease, T3 and T4 disease or N1 and N2 disease. A potential explanation for the difference in our results in comparison to the AJCC validation study is variation in tumour subsite. Many of the papers already published validating the 8th edition staging are comprised mostly or exclusively of tongue cancers [16,17]. However, there is evidence that tumour subsite within the oral cavity influences the prognosis of OCSCC. Recent data from the Surveillance, Epidemiology, and End Results (SEER) database for the period 2009-2015 reported a 5 year relative survival of 66.4% for tongue, 51.7% for FOM, 89.7% for lip and 59% for gum and other mouth primaries [19]. The subsite breakdown of the combined MSKCC-PMH dataset used by the AJCC to validate the 8th edition T and N categories and stage groupings is not listed in the staging manual, however, it is likely that tongue cases comprised the predominant subsite. In a paper published by the MSKCC group in 2012 analysing changing trends in smoking and alcohol consumption in 1617 patients with oral cavity SCC, 49% had tongue primaries and 16% FOM primaries [20]. Likewise, in a cohort of surgically resected oral cavity cancer patients extracted from the National Cancer Database covering the period 2009 to 2013, 51.1% were tongue primaries and 15.8% were FOM primaries (n = 16,246) [21]. The National Cancer Database captures approximately 70% of all newly diagnosed cancers in the USA [22], and is therefore likely to provide an accurate reflection of oral cancer subsite in North America. In contrast, our subsite distribution is strikingly different with a much higher representation of FOM primaries at 28%, and less tongue cases at 39%. Notably, sub-analysis confirmed that in contrast to the findings in our overall cohort, both TNM7 and 8 could separate T1 and T2 tongue cases, but not non-tongue cases. TNM7 could also separate stage I and II tongue primaries, the 8th edition could not. However, the latter finding may have been due to lack of power given the smaller numbers of stage I and II cases using TNM8 staging rules.
Our study has a number of limitations including its retrospective nature and wide timeframe. A further limitation is the lack of balance between groups in our cohort, whereby relatively equal numbers of subjects should ideally be present in each stage grouping to facilitate validation of the staging system [1]. In our study population the percentage of subjects in each stage group varied from 14.8% to 38.4% using TNM7 and from 16.8% to 34.3% for stage IVA&B combined using TNM8 (see Table 2). In particular stage III tumours were under-represented. Also, our subsite analysis was limited by small numbers. Finally slides were not re-reviewed to assign ENE status, but instead this parameter was extracted from the histology reports, excepting cases where this parameter had not been recorded in the original report. We also included patients with short survival, however, we felt that the inclusion of such patients in this study was appropriate, as advanced stage at the time of surgery may be a risk factor for early postoperative death. On the other hand, major strengths included the re-review of all slides for re-measurement of DOI according to TNM guidelines, so ensuring that patients were as accurately staged as possible by TNM8.

Conclusion
Both 7th and 8th edition AJCC stage categories for OCSCC correlated significantly with survival outcomes in our study cohort. Although TNM8 showed superior hazard discrimination in comparison to TNM7, deficiencies persisted with 8th edition staging rules including inability to distinguish between stage I and II disease, between stage III and IVA disease, and between contiguous T and N categories. It is possible that some of the differences in the present study between the AJCC validation study and other studies may have been in part related to differing proportions of tongue versus FOM subsite, suggesting that further work in is required to investigate impact of subsite on the prognostic value of DOI, ENE, and the 8th edition TNM staging.