Impact of Tumor Size on Prognosis in Differentiated Thyroid Cancer with Gross Extrathyroidal Extension to Strap Muscles: Redefining T3b

Simple Summary This study investigated the impact of tumor size on T3b differentiated thyroid cancer prognosis. No significant difference was found in the prognosis of small T3b tumors compared to the T1 tumors. Disease-specific survival, disease-free survival, and overall survival were significantly lower only in large T3b tumors compared to T2 and T3a. If T3b tumors are 2 cm or smaller, downstaging may be considered. The modified T category, reclassifying T3b (≤2 cm) as T1, showed better staging performance than the existing category. Adopting this modified T category could improve the prognostic accuracy of the AJCC/TNM staging. Abstract The prognostic significance of tumor size in T3b differentiated thyroid cancer (DTC) remains debated and underexplored. This study aimed to examine the varying impact of T3b based on tumor size, analyzing disease-specific survival, disease-free survival, and overall survival. A retrospective review of 6282 DTC patients who underwent thyroid surgery at Seoul St. Mary’s Hospital from September 2000 to December 2017 was conducted. T3b was classified into three subcategories, T3b-1 (≤2 cm), T3b-2 (2–4 cm), and T3b-3 (>4 cm), using the same size criteria for T1, T2, and T3a. T3b-1 showed no significant difference in disease specific survival compared to T1, and both disease-free and disease-specific survival curves were sequentially ranked as T1, T3b-1, T2, T3a, T3b-2, and T3b-3. The modified T category, reclassifying T3b-1 as T1, demonstrated superior staging performance compared to the classic T category (c-index: 0.8961 vs. 0.8959 and AUC: 0.8573 vs. 0.8518). Tumors measuring 2 cm or less within the T3b category may require downstaging, and a modified T category could improve the precision of prognostic staging compared to the current T category.


Introduction
Differentiated thyroid cancer (DTC) is widely recognized for its favorable prognosis due to its low disease-specific mortality (DSM) rate [1][2][3][4][5].Despite the generally low DSM of DTC, the American Joint Committee on Cancer/Union for International Cancer Control (AJCC/UICC) TNM staging system determines each tumor (T), regional lymph node (N), and distant metastasis (M) category based on their impact on disease-specific survival (DSS) [6][7][8].Therefore, the AJCC/UICC TNM staging system should clearly indicate the stratification of DSS corresponding to each disease stage.The current eighth edition of the AJCC/TNM staging system primarily differentiates between T1, T2, and T3a based on tumor size [9][10][11].However, the classifications of T3b, T4a, and T4b are determined by the presence of cancer invasion into the surrounding structures.Specifically, T3b is defined by gross extrathyroidal extension (gETE) into the strap muscles, determined by the surgeon's visual assessment, irrespective of the tumor size [9,[12][13][14][15].
In 2017, the redefinition of T3b was based on gETE rather than minimal ETE (mETE).In response to this, numerous studies have been conducted to validate the change in the T3b classification [16][17][18][19][20][21][22].Several studies suggest that gETE limited only to the strap muscles does not affect the prognosis [16,[18][19][20][21], while other research indicates that it significantly worsens the prognosis [23][24][25][26].Such inconsistent results may be due to overlooking the impact of the tumor size in T3b.Several studies have evaluated the influence of T3b based on the size of the primary tumor [19][20][21].In our prior institutional research, we established that there was no difference in recurrence rates between T3b with a small tumor size and T2 disease [17].However, the study was constrained by limitations, such as solely investigating disease-free survival (DFS) instead of DSS, and omitting T1 and T3a.Building on these findings, our objective is to elucidate the impact of T3b across all tumor size categories.We conducted a thorough prognosis assessment covering DFS, DSS, and overall survival (OS) across not only T2, but also T1 and T3a categories.
This study aimed to clarify the significance of tumor size in T3b on the DFS and DSS of DTC through comparing T3b subcategories with other T categories and by analyzing the risk factors for DSM.Ultimately, our goal is to propose a modified T category.

Patients
We conducted a retrospective review on 6811 patients with DTC who underwent thyroid surgery at Seoul St. Mary's Hospital (Seoul, South Korea) from September 2000 to December 2017.The exclusion criteria for this study included 31 patients detected with distant metastasis at the time of their initial diagnosis, 231 patients who underwent initial surgery at other hospitals, 182 patients with incomplete data, and 85 patients lost to follow-up.Ultimately, a total of 6282 patients were incorporated into the study (Figure 1).Clinicopathological information was validated through pathologic reports, with the exception of T3b, which was confirmed intraoperatively by the surgeons [17].This study was conducted in accordance with the principles stated in the Declaration of Helsinki (revised in 2013).The Institutional Review Board of Seoul St. Mary's Hospital at the Catholic University of Korea has approved the research protocol (IRB No: KC22RISI0611 and date of approval: 30 August 2022), and due to the retrospective nature of the study, the requirement for informed consent was waived.

Perioperative Management and Follow-Up Evaluation
All patients received preoperative evaluation and follow-up in accordance with the 2015 ATA management guidelines [27].Physical examinations, serum thyroid function tests, thyroglobulin, and anti-thyroglobulin antibody measurements were performed at 2 weeks, 3 months, and 6 months after surgery, and were subsequently conducted annually.Neck ultrasound was conducted annually.Patients who needed additional radioactive iodine (RAI) ablation underwent treatment at least 12 weeks post-thyroidectomy, and wholebody scans were conducted approximately 1 week following the RAI ablation.Patients suspected of recurrence underwent additional imaging procedures, such as computed tomography, positron emission tomography/computed tomography, and RAI whole-body scans, to determine the location and severity of the recurrence.The confirmation of disease recurrence was accomplished through a pathological diagnosis from ultrasound-guided fine-needle aspiration/core needle biopsy or surgical biopsy.The mortality rate data were supplied by the Cancer Center Operations Team at Seoul St. Mary's Hospital and Central Cancer Registry data, which is based on death records from the Korean Statistical Office.

Primary and Secondary Endpoints
The primary endpoint was a comparison of DSS and DFS among the subclassified T categories, and the secondary endpoint was the predictive accuracy of DSS between the traditional and newly modified T categories.

Statistical Analysis
Continuous variables are represented by means and standard deviations, and numbers described in percentages represent categorical variables.Student's t-test was used to compare continuous variables.To investigate the differences in categorical features among T subcategories, either Pearson's chi-square test or Fisher's exact test was used.Univariate and multivariate Cox regression models were utilized to validate significant DSS predictors.The hazard ratios (HR) and their 95% confidence intervals (CI) were calculated.Kaplan-Meier survival curves were plotted for DSS, DFS, and OS, and statistically significant differences were identified using a log-rank test.To assess the predictive capability of the newly modified T categories, Harrell's concordance index (c-index) [28] and the time-dependent Receiver Operating Characteristic (ROC) curve analysis, as explained by Heagerty et al. [29], were utilized to compute the integrated area under the curve (AUC).Differences with p-values less than 0.05 were deemed to be statistically significant.Statistical analyses were conducted using the Statistical Package for the Social Sciences (version 24.0) and R software (version 4.3.1).

Univariate and Multivariate Analyses for Disease-Specific Mortality Risk Factors
As indicated in Table 2, for tumors ≤ 2 cm (T1 or T3b-1), the gETE to the strap muscles did not have a significant impact on DSM (HR, 2.754; CI, 0.344-22.036;p = 0.340).The univariate analysis identified age and tumor size as significant risk factors.However, in the multivariate analysis, only age was identified as an independent risk factor (HR, 1.165; CI, 1.086-1.250;p < 0.001).As indicated in Table 3, regarding tumors in a size range of 2-4 cm (T2 or T3b-2), age, vascular invasion, and T category emerged as significant risk factors for DSM in the univariate analysis.The multivariate analysis reaffirmed the significance of age (HR, 1.088; CI, 1.026-1.154;p = 0.005), vascular invasion (HR, 15.159; CI, 3.511-65.450;p < 0.001), and T category (HR, 11.173; CI, 2.120-58.867;p = 0.004), emphasizing that the DSM risk for T3b-2 is higher than that for T2.
As detailed in Table 4, for tumors larger than 4 cm (T3a or T3b-3), age, tumor size, and T category were identified as significant risk factors for DSM in the univariate analysis.In line with this, the multivariate analysis indicated that age (HR, 1.069; CI, 1.008-1.134;p = 0.027), tumor size (HR, 2.131; CI, 1.253-3.626;p = 0.005), and T category (HR, 28.902; CI, 1.984-421.006;p = 0.014) maintained their significance.Results in Tables 3 and 4 underscored that gETE into the strap muscle significantly increased the DSM risk in tumors larger than 2 cm.

Revision of T Category Based on Survival Analysis of T3b Subcategories
In Figure 2, we conduct an analysis of DSS, DFS, and OS based on the categories T1, T2, T3a, T3b-1, T3b-2, and T3b-3.In the DSS curve analysis, T3b-1, which is characterized by a smaller primary tumor size, demonstrated a higher DSS compared to T2 (log-rank p < 0.001) (Figure 2a).In the survival curve analysis for DFS and OS, T3b-1 consistently exhibited a higher DSS than T2 (log-rank p < 0.001) (Figure 2b,c).As a result, in all aspects of DSS, DFS, and OS, the survival curves were arranged in the sequence of T1, T3b-1, T2, T3a, T3b-2, and T3b-3.
Based on the results shown in Tables 2-4 and Figure 2, tumors measuring 2 cm or less were classified as 'T1', regardless of the presence or absence of strap muscle invasion (Figure 3).Only for tumors larger than 2 cm, those infiltrating the strap muscles have been newly defined as T3b′ in the proposed modified category.DSS curves were plotted for both the classic T categories (T1, T2, T3a, and T3b) and the modified T categories (T1′, T2′, T3a′, and T3b′).Both staging systems stratifying DSS in the sequence of T1, T2, T3a, and T3b (T1′, T2′, T3a′, and T3b′).However, a significant difference in DSS between T3a′ and T3b-3′ was observed only in the modified T categories (log-rank p = 0.048).

Predictive Performance of Classic vs. Modified T Categories for DSS
Table 5 presents the Harrell's c-index and the AUC of the time-dependent ROC, used to compare the predictive capabilities of the classic T category and the modified T category for DSS.Harrell's c-index was higher in the modified T category than in the classic T category (0.8961 vs. 0.8959).The AUC of the time-dependent ROC for 5-year DSS was also higher in the modified T category compared to the classic T category (AUC, 0.8573 vs. 0.8518).

Discussion
This study demonstrates the clinical significance of tumor size in the T3b category of DTC.In cases where the tumor is ≤2 cm, there was no significant difference between DSM and OM, regardless of the presence or absence of gETE.In multivariate analyses, it was emphasized that the T3b category significantly impacts the DSM risk only in tumors larger than 2 cm, underscoring the importance of size in determining the prognosis in the T3b category.In the survival curves, including DSS, DFS, and OS, they were consistently ranked in the following order: T1, T3b-1, T2, T3a, T3b-2, and T3b-3.In summary, the modified T category, which incorporated T3b-1 into the new T1 ′ category, demonstrated superior predictive capability for DSS compared to the existing T category.These results highlight a significant interaction between the tumor size and gETE to the strap muscles.
Song et al. reported that there was no significant difference in DSS when comparing cases with gETE to those without gETE [19,21].The reason is presumed to be that the majority of T3b tumors are under 2 cm in size.In our study, out of 349 patients diagnosed with T3b, 239 patients were classified under T3b-1 with tumors less than 2 cm, accounting for 68.5%.In the classic T category, there was no difference in the DSS between T3a and T3b, which supports the findings of previous studies.However, in the modified T category, when T3b-1 was downstaged to T1, a significant difference was observed in the DSS between T3a ′ and T3b ′ .This reemphasizes that in order to accurately reflect the impact of strap muscle invasion, it is necessary to concurrently evaluate tumor size.
Numerous studies have compared mETE and gETE in patients diagnosed with DTC [35, [40][41][42][43].The fact that mETE and gETE present different prognoses can be extrapolated to explain that the influence of ETE changes according to tumor size.The majority of these studies indicate that the presence of gETE usually suggests a worse prog-nosis than mETE, including higher rates of recurrence or mortality [35,41,42].According to the research conducted by Park et al., mETE has a more favorable prognosis than gETE, and the extent of ETE impacts the recurrence of the tumor [43].This distinction is based on whether it can be seen under a microscope or with the naked eye, but there is a need for a more objective and accurate size standard.According to our study, the criterion is determined to be 2 cm based on the objective results of comprehensive analyses.
Another possible explanation for the varying impact of T3b depending on the tumor size could be that the possibility of complete surgical removal of the tumor may differ based on tumor size.In the MACIS (Metastases, Age, Completeness of resection, Invasion, Size) scoring system, a key indicator for assessing the prognosis of thyroid cancer, the principle of complete resection has been deemed to be significant for a long time [44,45].However, due to the lack of standardized guidelines regarding the extent of strap muscle resection, many surgeons rely more on visual assessment than on confirming the safe margin through frozen-section analysis, even when they encounter a T3b stage intraoperatively [46].Khan et al. reported in a study of the National Cancer Database, which involved a large cohort of 14,471 individuals, that the size of a large tumor significantly influenced margin positivity (p = 0.021), and margin positivity significantly decreased survival rates (p = 0.038) [47,48].If tumor size is less than 2 cm, there is a possibility that the depth of invasion into the strap muscle may also be less, potentially making R0 resection more achievable.Shaha also emphasized the importance of complete resection in gETE [49].Based on the aforementioned studies, the critical prognostic determinant within gETE may be the completeness of the tumor excision, which is affected by the tumor size.
To the best of our knowledge, there has been no study so far that has subdivided T3b based on tumor size using the same criteria as applied to T1, T2, and T3a.Furthermore, we compared the performance of the new stage with that of the traditional stage.In numerous prior stage comparison studies, Harrell's c-index has been employed, functioning as an objective measure to evaluate the predictive ability of a model [9,28,50,51].Furthermore, by calculating the AUC of the time-dependent ROC [29,52], it was demonstrated that the modified T category in this study is superior in evaluating DSS using both evaluation methods.
This study has clear strengths.First, it is emphasized by a comprehensive cohort that includes 6282 patients with DTC, who have been tracked for nearly a decade, providing substantial long-term follow-up.Second, the comprehensive analysis evaluated not only DFS, but also DSS and OS, providing a holistic perspective on prognosis, which includes not only recurrence, but also DSM.It is particularly noteworthy that this study explored the DSM, a crucial indicator ideally suited for evaluating the AJCC-TNM staging system.Finally, a notable characteristic of our research is that it provides objective indicators through the prediction of stage performance using various methodologies, including Harrell's c-index and time-dependent ROC.This has enabled a comparative analysis between the existing staging and the modified staging, further strengthening the robustness of our research results.
Nonetheless, this study has several limitations.First, this is a retrospective study from a single center, which may carry the potential for selection bias.Second, the evaluation of gETE relied on the surgeon's visual judgment, introducing a subjective element that could be susceptible to observation bias.Developing a method to evaluate ETE with standardized procedures could provide substantial benefits in the future.

Conclusions
In conclusion, T3b with a smaller tumor size (≤2 cm) demonstrated no significant difference in DSS and DFS compared to T1 in the current 8th edition of the AJCC-TNM Staging system.The modified T category, which reclassifies T3b (≤2 cm) as T1, demonstrated a more efficient performance than the existing category.If the smaller T3b is reclassified to T1, the new stage could potentially indicate a better stratification for prognosis, possibly eliminating the need for aggressive treatment.

Figure 1 .
Figure 1.Participant flow diagram of patient selection and T3b subcategories.

Table 1 .
Baseline clinicopathological characteristics of the study population.

Table 2 .
Univariate and multivariate analyses of disease-specific mortality risk factors in patients with T1 and T3b-1 (≤2 cm).

Table 4 .
Univariate and multivariate analyses of disease-specific mortality risk factors in patients with T3a and T3b-3 (>4 cm).

Table 5 .
Comparison of Staging systems in Classic T categories and Modified T categories.