A new clinical model for predicting lymph node metastasis in T1 colorectal cancer

Purpose Lymph node metastasis (LNM) is a crucial factor that determines the prognosis of T1 colorectal cancer (CRC) patients. We aimed to develop a practical prediction model for LNM in T1 CRC. Methods We conducted a retrospective analysis of data from 825 patients with T1 CRC who underwent radical resection at a single center in China. All enrolled patients were randomly divided into a training set and a validation set at a ratio of 7:3 using R software. Risk factors for LNM were identified through multivariate logistic regression analyses. Subsequently, a prediction model was developed using the selected variables. Results The lymph node metastasis (LNM) rate was 10.1% in the training cohort and 9.3% in the validation cohort. In the training set, risk factors for LNM in T1 CRC were identified, including depressed endoscopic gross appearance, sex, submucosal invasion combined with tumor grade (DSI-TG), lymphovascular invasion (LVI), and tumor budding. LVI emerged as the most potent predictor for LNM. The prediction model based on these factors exhibited good discrimination ability in the validation sets (AUC: 79.3%). Compared to current guidelines, the model could potentially reduce over-surgery by 48.9%. Interestingly, we observed that sex had a differential impact on LNM between early-onset and late-onset CRC patients. Conclusions We developed a clinical prediction model for LNM in T1 CRC using five factors that are easily accessible in clinical practice. The model has better predictive performance and practicality than the current guidelines and can assist clinicians in making treatment decisions for T1 CRC patients. Supplementary Information The online version contains supplementary material available at 10.1007/s00384-024-04621-y.


Introduction
Colorectal cancer (CRC) ranks as the third most prevalent malignant tumor globally and stands as the second leading cause of cancer-related deaths [1].Its incidence and mortality rates persistently rise, substantially adding to the overall burden of cancer worldwide [2][3][4].There is a growing recognition of the importance of screening colonoscopy and other preventive measures for CRC in an expanding array of countries [5,6].
Advancements in endoscopic techniques have resulted in a heightened detection rate of T1 CRC and an increased number of endoscopic resections [7].Endoscopic resection is considered a curable approach if there is no evidence of lymph node metastasis (LNM) [8].However, patients with a risk of LNM must undergo radical surgery after endoscopic resection to reduce the risk of cancer recurrence [9].Hence, accurate assessment of the likelihood of LNM in T1 CRC patients is pivotal for guiding treatment decisions.
In the guidelines from the United States, Europe, Korea, and Japan, certain factors are designated as high-risk for lymph node metastasis (LNM) in T1 colorectal cancer (CRC), warranting surgical resection.These factors include a depth of submucosal invasion (DSI) of ≥ 1000 μm, lymphovascular invasion (LVI), tumor grade (TG, G3-poorly differentiated adenocarcinoma, signet-ring cell carcinoma, or mucinous carcinoma) and tumor budding (TB, BD2/3) [10][11][12][13][14][15][16][17].However, it is noteworthy that only about 10% of patients identified as high-risk based on these guidelines actually exhibit LNM, with the vast majority (> 90%) eventually showing negative lymph nodes upon histological examination of the surgical specimen [18][19][20][21][22][23].This discrepancy underscores the limitations of current guidelines, which fail to consider additional risk factors.The binary nature of these guidelines leads to significant overtreatment, placing a strain on clinical healthcare resources.Given these challenges, there is an urgent need for a clinical prognostic model that integrates more dependable predictors to customize optimal treatment strategies for T1 CRC patients.
This study analyzed the risk factors for LNM in 825 patients with T1 CRC from a single center in China.Subsequently, a clinical prediction model was established, requiring only additional data on endoscopic gross appearance and the objective factor of sex based on the guidelines.The model has demonstrated good predictive performance and practicality, and it is expected to be clinically utilized to aid in treatment selection for T1 CRC patients.

Study design and population
Patient data for this study were obtained from the prospectively maintained institutional database program of colorectal disease at the Sixth Affiliated Hospital, Sun Yat-sen University (Guangzhou, China).All patients over 18 years of age with primary T1 CRC who had received radical tumor resection between January 2010 and August 2021 were enrolled.Inclusion criteria were ( 1 (5) insufficient pathological and follow-up information.To enhance the robustness of the analysis, all enrolled patients were randomly divided into a training set and a validation set at a ratio of 7:3 using R software, as illustrated in Fig. 1.Furthermore, patients were categorized into lymph node metastasis negative (LNM-negative) and lymph node metastasis positive (LNM-positive) groups based on the presence of lymph node metastasis for subsequent analysis.This study protocol received approval from the Ethics Review Committee of the Sixth Affiliated Hospital, Sun Yat-sen University (2022ZSLYEC-120).

Data collection
A comprehensive collection of demographic, clinical, surgical, and post-operative data was undertaken by proficient assistants from the institutional database.Additionally, the accuracy of pathological information was ensured through confirmation by a minimum of two experienced pathologists. In

Study definitions
All pathology reports from enrolled patients were reviewed for the presence of LNM.In this study, patients with an age of CRC onset younger than 50 were defined as earlyonset CRC (EOCRC) patients.According to the Paris classification of endoscopic findings of superficial colorectal neoplasms, the tumors were morphologically classified into depressed and undepressed [24].A lesion of IIc or III classification in the Paris classification was defined as a depressed lesion.Tumor grade (TG) was classified according to the histologic type.In brief, G1 was assigned for tumors diagnosed as papillary adenocarcinoma and well-differentiated tubular adenocarcinoma, G2 to moderately differentiated adenocarcinoma, and G3 to poorly differentiated adenocarcinoma, mucinous adenocarcinoma, or signet ring cell carcinoma.Regarding the measurement of depth of submucosal invasion (DSI), the depth was measured from the lower margin of the mucosal muscle to the deepest invasion, provided the location of the mucosal muscle could still be identified or estimated.If the mucosal muscle was completely absent, the depth was measured from the surface to the deepest invasion edge.DSI as an independent predictor of LMN is controversial.Based on the results of previous studies, we explored the prediction effectiveness using the composite indicator of DSI with TG.Tumor budding (TB) was defined as a cancer cell nest consisting of 1 to < 5 cells that infiltrated the interstitium at the invasive margin of cancer.TB was graded based on the number of buds as BD1 (< 5), BD2 (5-9), or BD3 (≥ 10), as previous research had reported.

Statistical analyses
Continuous variables were compared using either Student's t-test or Wilcoxon rank-sum test, while the Chi-squared test was used for discrete variables to compare the distribution characteristics.We performed a univariate logistic regression model to estimate the association between risk factors and LNM.All variables with a P value < 0.05 were further fitted into a multivariate model in the "enter" way.Additionally, ROC curve analysis was applied to test the prediction ability of the model in the training set and validation set.All statistical analyses were performed using SPSS software (Version 22.0) and R software (Version 4.0.0).All statistical tests were two-sided, and a P value < 0.05 was considered statistically significant.

Characteristics of the training and validation cohorts
The clinical characteristics of both the training and validation sets are summarized in Table 1.The training set comprised 577 patients, while the validation set included 248 patients, making a total of 825 enrolled individuals.The rate of lymph node metastasis (LNM) was 10.1% in the training cohort and 9.3% in the validation cohort, with no statistically significant difference observed (P = 0.731).In the training set, the mean age was 58.5 years, with a male-to-female ratio of 55.5:44.5.Among the 577 lesions, 85 (14.7%) were characterized as depressed type, and 42 (7.3%)exhibited submucosal invasion of less than 1000 μm.BD1 was observed in 514 (89.1%) tumors, and 545 (94.5%) lesions were without lymphovascular invasion (LVI).The clinicopathological characteristics were comparable between the training and validation sets.

Risk factors for LNM and development of a predictive model
The results of the univariate analysis for the training set are presented in Table 2. Consistent with established guidelines, LVI, TB (BD2/BD3), and TG (G3) were identified as risk factors for LNM.Interestingly, DSI(DSI ≥ 1000 μm) was not identified as a standalone risk factor, but when combined with G2/G3 to analysis, it demonstrated significance.A logistic regression prediction model was constructed using the six variables demonstrating significant associations with LNM in Additionally, sex and the endoscopic gross appearance of depressed type were recognized as additional risk factors for LNM.Female gender is a risk factor for lymph node metastasis in T1 colorectal cancer.However, the role of gender factors is not consistent in early versus late onset populations.In T1 late-onset colorectal cancer (LOCRC), the rate of LNP in female patients (12.8%) was higher than that in male patients (7.2%; P = 0.049).Notably, sex did not exert an impact on LNM in T1 EOCRC patients (P = 0.640), as detailed in Table 3. Supplementary Table 1 provides the clinical-pathological characteristics of EOCRC and LOCRC.
The endoscopic gross appearance of depressed type was another additional risk factors for LNM.The tissue with the depressed appearance was found to have a more superficial DSI as outlined in Supplementary Table 2.

Overall performance of the prediction model
To validate its reliability, the Hosmer-Lemeshow statistic for the model was 2.379 (P = 0.795).When predicting the risk of LNM in the validation set using risk factors from current guidelines, Fig. 2 illustrates that 231 out of 248 lesions were categorized as high risk, while 17 were deemed low risk.Among the high-risk group, 21 lesions (9.1%) exhibited LNM, whereas in the low-risk group, 2 lesions (11.8%) showed LNM.Remarkably, the guideline risk factors did not accurately differentiate lesions with LNM.The guideline-combination model, aiming to avoid an all-or-nothing decision, assigned 202 patients to the low-risk group.However, 43.5% (10 out of 23) of positive patients in the low-risk group were misdiagnosed.Utilizing the predictive model developed in this study, 121  surgical treatment.However, only 21 patients (9.5%) manifested LNM, suggesting that 210 patients (90.5%) may have undergone unnecessary treatment.Across various cohort studies, an estimated 80-90% of patients are reported to experience overtreatment based on guideline criteria [22,[25][26][27].Initially, we formulated a predictive model relying on guideline-based factors (guideline-combination model), avoiding the all-or-nothing decision-making approach that categorized 202 patients (81.5%) into the low-risk group.However, this model, with lower sensitivity, exhibited a notable false-negative rate of 43.5% for positive patients.Subsequently, following a detailed analysis of guideline and non-guideline risk factors, our clinical prediction model, integrating the DSI-TG risk factor and two easily assessable preoperational elements-sex and endoscopic gross appearance-displayed enhanced prediction accurate rate (98.4%) while concurrently mitigating overtreatment in almost half of the patients.The ease of obtaining additional factors further amplifies the clinical applicability of our clinical prediction model.A recent meta-analysis has highlighted that DSI is not an independent risk factor for LNM in T1 CRC but holds predictive significance when analyzed in conjunction with other risk factors [28].Consistent with this finding, our study corroborates that DSI exhibits enhanced predictive efficacy when combined with tumor grade (DSI-TG, AUC = 0.621; DSI, AUC = 0.492).While previous studies have endeavored to enhance the predictive efficacy of DSI by modifying its assessment criteria, current evaluation models for determining the depth of submucosal invasion in early-stage colorectal cancer primarily include the Haggitt classification and Kikuchi-SM system.These models are recommended for assessing the risk of pedunculated and sessile polyps, respectively [17].
Kikuchi et al. introduced a novel model that categorizes the depth of tumor submucosal invasion based on its approximate distance from the muscularis mucosae [29].According to this model, superficial invasion (within 200-300 μm of the muscularis mucosae) is designated as SM1, while deep invasion (proximal to the muscularis propria) is classified as SM3.Depths of infiltration falling between SM1 and SM3 are denoted as SM2.This model was employed to reevaluate the depth of submucosal invasion in patients with early-stage colorectal cancer.However, it's noteworthy that the precise definition of SM1 to SM3 within the SM system can vary among different studies, potentially introducing subjective variability and impacting the clinical applicability of the model.Our study refrained from altering the assessment criteria for this pathological feature, minimizing the impact of subjective variability on the model's practicality.
Endoscopic screening plays a pivotal role in diagnosing early-stage CRC [5,30].Recent research has begun to explore the significance of endoscopic gross appearance in predicting LNM in T1 CRC patients.Our study identified the gross appearance of depressed type as a risk factor for LNM in T1 CRC.While guidelines mainly rely on postoperative pathological characteristics, evaluating endoscopic gross appearance can complement preoperative assessments.The classification of endoscopic gross appearance into the depressed type is based on the simplified Paris classification, facilitating its straightforward clinical acquisition and ensuring the practicality of our clinical prediction model.Additionally, the depressed type of endoscopic gross appearance is associated with a higher risk of LNM and a shallower average DSI.DSI is one of the criteria for surgical resection in the guidelines.This finding underscores the importance of reassessing the DSI criterion concerning different endoscopic gross appearances.
Our study revealed that protective factors in female patients with T1 CRC are often overlooked.Despite numerous cohort studies examining objective risk factors, the impact of gender on LNM in T1 CRC remains contentious, with inconsistent findings reported [19,21,31].A pertinent meta-analysis suggests a higher likelihood of LNM among female patients with T1 CRC [32].However, several studies underscore the protective influence of estrogen in CRC, challenging the notion that females are inherently more susceptible to LNM in T1 CRC [33][34][35].In our study, the results of univariate analysis confirmed the gender differences across age groups, with sex having no effect on LNM in T1 EOCRC.Conversely, in T1 LOCRC, female emerged as a risk factor for LNM (P = 0.049).It appears that there exists a notable contrast in estrogen levels between female T1 EOCRC and female T1 LOCRC patients.This divergence may account for the distinct influence of sex on LNM in the two patient groups.A population-based data study conducted in the United States on the risk factors for LNM in young T1 CRC patients also supports this conclusion [36].They found that the overall LNM rate in T1 CRC was approximately 22% in young patients (less than 45 years old), with a slightly higher incidence in females.Tumor size and tumor grade were significant predictors of LNM in T1 CRC cancer patients.However, gender was not found to be an independent predictor of LNM in young patients.The compositional bias in these two patient groups might contribute to the differing research conclusions on the objective risk factor of sex.As the incidence of EOCRC continues to rise and the incidence among the elderly declines, it has become a new global trend in CRC epidemiology [30,37].Therefore, in subsequent more data of multi-center cohort studies developing risk scoring systems, it is imperative to conduct separate investigations for EOCRC and LOCRC, representing a key focus for our future work.
The current study has some limitations.Firstly, it lacks an external validation cohort for prospective assessment, relying solely on data from a single center.Further validation from multi-center sources would enhance the robustness and generalizability of the findings.Secondly, there is minimal missing data.Thirdly, the restricted sample size hampers the development of predictive models for T1 EOCRC and T1 LOCRC separately.However, our clinical prediction model exhibited significant predictive power in both groups (Supplementary Fig. 1).This is attributed to the considerably larger number of patients with LOCRC than EOCRC, while the effect of gender is less pronounced.Thus, while the clinical prediction model in this study exhibits practical utility, its efficacy necessitates further refinement and validation.Currently, it primarily serves as a tool to aid clinicians in formulating treatment strategies for T1 CRC patients.
We developed a clinical prediction model for lymph node metastasis in T1 colorectal cancer patients, based on five factors: sex, endoscopic gross appearance, depth of submucosal invasion combined with tumor grade, lymphovascular invasion, and tumor budding.The model improved the accuracy and practicality of risk stratification, compared to the current guidelines, and reduces overtreatment by almost half.We also identified the protective role of female sex in T1 colorectal cancer.We suggested that early-onset and late-onset colorectal cancer patients should be analyzed separately.Further validation from multi-center prospective studies is warranted.

Fig. 2 International
Fig. 2 Performance evaluation of the prediction model in predicting lymph node metastasis from validation data set.A A receiver operating characteristic curve analysis to compare the performance of the risk factors of current guidelines, guideline-combination model and the prediction model in a validation cohort.B Comparison of overtreatment frequency among current guidelines, guideline-combination model and our prediction model.HR high risk, LR low risk, LNM lymph node metastasis ◂

Table 2
. TG was excluded from the regression model due to multicollinearity (i.e., less specificity and high correlation with DSI-TG), which could potentially diminish the statistical significance of the model.Consequently, the model comprised five independent factors (depressed endoscopic gross appearance, sex, DSI-TG, LVI, TB) as LNM predictors.LVI emerged as the most potent predictor for LNM, increasing the incidence of metastasis tenfold (OR, 10.

Table 1
Clinicopathologic characteristics of the training and validation data sets R rectum, D descending colon, S sigmoid colon, T transverse colon, A ascending colon, C cecum *Student t test (age and tumor size) or χ 2 test (other categorical variables)

Table 2
Univariate for selected risk factors and logistic regression model to predict for LNM in T1 colorectal cancer (training data set) LNM lymph node metastasis, CI confidence interval, R rectum, D descending colon, S sigmoid colon, T transverse colon, A ascending colon, C cecum, DSI depth of submucosal invasion, TG tumor grade, DSI-TG depth of submucosal invasion combined with tumor grade

Table 3
Univariable analysis about sex for predicting LNM in EOCRC and LOCRC LNM lymph node metastasis, EOCRC early-onset colorectal cancer, LOCRC late-onset colorectal cancer