Development and validation of prognostic factors for lymph node metastasis in endometrial cancer: A SEER analysis

Objective: The purpose of this study was to develop and validate a nomogram that can be used to predict lymph node metastasis (LNM) in patients with endometrial carcinoma (EC). Methods: Clinical data of EC patients diagnosed between 2004 and 2015 were retrieved from the Surveillance, Epidemiology, and End Results Program (SEER) registry. The nomogram was constructed using independent risk factors chosen using a multivariate logistic regression analysis. Accuracy was validated for both groups using discrimination analysis and calibration curves. The predictive accuracy and clinical value of the nomogram and Mayo criteria were compared using decision curve analysis (DCA). Results: The nal study group consisted of 63,836 women that met specic inclusion criteria. The factors that were identied in the multivariate analysis to be notable predictors of LNM were age, tumor size, histological type, cervical stromal invasion, tumor grade, and myometrial invasion. These risk factors were included in the nomogram. Discriminations of the nomogram and Mayo criteria were 0.848 (95% CI: 0.843-0.853) and 0.806 (95%CI: 0.801-0.812), respectively. In the validation group, the AUC values were 0.847 (95%CI: 0.840-0.857) and 0.804 (95%CI: 0.796-0.813) for the nomogram and the Mayo criteria, respectively ( P <0.01). Calibration plots showed that training and validation cohorts were well-calibrated. DCA revelaed that by using the nomogram always had a positive net benet compared to using the Mayo criteria. Conclusions: A nomogram was developed to predict LNM in EC patients based on a large population-based analysis. The nomogram showed good performance for predicting LNM in patients with EC.


Introduction
Endometrial cancer (EC) is the most common gynecologic malignancy and the fourth most common of all tumor types in Western countries. According to the American Cancer Society, there are approximately 63,230 new cases of endometrial cancer each year in the United States and 11,350 related deaths [1] .
Symptoms such as abnormal vaginal bleeding usually present quickly after onset, so most EC patients are diagnosed at an early stage. Early diagnosis can allow for early treatment and good prognosis for EC patients. However, lymph node metastasis (LNM) is a major risk factor for recurrence and metastasis of prediction of a speci c event. In general, nomograms used for predicting patient outcomes discriminate patients with a future event from those without. The performance of the nomogram is initially validated by providing discrimination and calibration values [7] . Discrimination evaluates whether the model is able to discriminate patients (distinguish one patient from another) and is generally expressed with area under the curve (AUC) values [8] . Calibration describes how close predicted and actual outcomes are.
Many researchers have attempted to identify risk factors for LNM and predict the probability of LNM by developing nomograms constructed using different risk factors. The Mayo risk strati cation model predicts the risk of LNM based on 4 factors: tumor grade, histological type, tumor size, and myometrial invasion. Some studies also have compared the Mayo criteria with other models including nomograms designed as part of their studies [9,10] . However, due to the small size of patient populations, these studies could not provide crucial information on how small incremental changes in tumor size may affect patient outcomes. To date, there is no published model for predicting lymph node metastasis in EC patients based on a large cohort.
Because of this need, we designed a nomogram to predict LNM based on data from the Surveillance, Epidemiology, and End Results (SEER) database. In this study, we developed and internally validated a parametric model for predicting LNM. The model includes pathological characteristics based on a mathematical algorithm from a large population. Decision curve analysis also was used to estimate the clinical value of the nomogram and compare it with the Mayo criteria.

Patients and study design
The National Cancer Institute's SEER database on cancer research is freely available to the public upon submission of a signed data-used agreement to the SEER administration. We extracted data for endometrial cancer cases diagnosed between 2004 and 2015 from the SEER registry to use for further analysis. Inclusion criteria for patients were as follows: age at diagnosis > 18 years; and endometrial cancer was pathologically con rmed by histology (histological code: 8140-8389 for EEA, 8440-8499 for SEA). Patients who had a history of prior malignancy, or who had missing information regarding lymph node metastasis, race, marital status, tumor size, histology, myometrial invasion, cervical stromal invasion, or tumor grade were excluded. A total of 63836 patients in the SEER cohort meeting the criteria were selected for further analysis. Patients were then randomly divided into two groups in a 2:1 ratio, to form a training cohort (n=42558) and an internal validation cohort (n=21278). The ow chart used for data selection is shown in Figure. 1. The ethics committee board of Peking University People's Hospital approved the use of patient data for this study.

Statistical analyses
All the categorical variables were described as frequencies and percentages. Univariate and multivariate logistic regression analyses were used to identify independent risk factors predictive of LNM and to develop the nomogram in the training cohort. All variables in the univariate analysis with P<0.05 were considered statistically signi cant and selected for multivariate analysis. Candidate variables with P<0.05 were selected using a backward stepwise selection from the full multivariate model. The nomogram was then constructed using these candidate variables.
The performance of the nomogram was assessed in both the training and validation groups by calculating discrimination and calibration criteria [11] . Discrimination was quanti ed using an area under the receiver operating characteristic (ROC) curve. The area under the curve (AUC) is a summary measure of the ROC that re ects the ability of a test to discriminate the outcomes across all possible levels of positivity. AUC values range from 0 to 1, and a model is considered to have a poor, fair or good performance if the AUC value is between 0.5 to 0.6, 0.6 to 0.7 or greater than 0.7, respectively. A calibration plot was generated to visualize how far the predictions were from the actual outcomes, displaying mean nomogram-based predictions in training and validation cohorts on the horizontal axis versus actual observed LNM probabilities.
Finally, decision curve analysis was performed to quantify the clinical usefulness of the model. Such analyses can determine the ability of a model to predict ne-scale outcomes based on a set of risk parameters. A model that performs well in the decision curve analysis has a higher net bene t than a model that simply classi es all patients as having the predicted outcome or no (zero) patients as having the outcome. Decision curve analysis can also be used to compare the net bene ts of multiple models.
All analyses were performed using SPSS 21.0 and R software version 3.4.4 (https://www.r-project.org/), using the rms, presence/absence, and decision curve packages. P<0.05 was considered statistically signi cant. Results:

Clinical characteristics of patients
The data from a total of 63836 patients were included in the study. Out of the patients, 42558 patients were placed within the training cohort, while 21278 were placed within a validation cohort. Figure. 1 shows a schematic of the screening process. The mean ages of patients within the training and validation sets were 62.41 ± 11.62 years and 62.43 ± 11.58 years, respectively. Tumor size was 6.58 ± 15.63 cm in the training cohort and 6.56 ± 15.62 cm in the validation group. In the training cohort, most patients (88.45%) were negative for LNM. The age of 3.49% patients was less than 40 years old, 10.11% patients were between 41 and 50 years old, 30.79% were between 51 and 60 years old, 31.86% were between 61 and 70 years old, and 23.75% were older than 71 years old. Most of the patients in both cohorts were white (82.40%), married (53.79%), and their tumor size was between 2cm to 5cm (49.36%). The pathological characteristics of the majority included EEA (87.27%), no myometrial invasion (65.10%), no cervical stromal invasion (79.80%) and tumors classi ed as grade 1 (41.50%). The two sets showed similar results for nearly all variables. Table 1 shows the details of demographic and pathological characteristics of the patients in the two cohorts.

Risk factors for lymph node metastasis
Our univariate analysis considered age at diagnosis, marital status, race, tumor size, histological type, myometrial invasion, cervical stromal invasion, and tumor grade as potential risk factors for LNM from the training cohort data. After multiple logistic regression analysis, it was found that independent risk factors associated with LNM including age at diagnosis, tumor size, histological type, myometrial invasion, cervical stromal invasion, and tumor grade (

Design and validation of the nomogram
Based on the independent risk factors identi ed in the multivariate regression analysis, we designed a nomogram to predict LNM in EC patients ( Figure. 2). Among the variables considered in the predictive model, cervical stromal invasion was identi ed to be the most important predictive factor for the LNM nomogram. Point assignments and predictive scores for each variable in the nomogram models were calculated, with the total score corresponding to a predicted probability of LNM. The performance of the nal model was assessed through discrimination and calibration analyses. Based on these analyses, the

Optimal threshold of the nomogram
Each patient was assigned a score using the calibrated nomogram. Then, an optimal cut-off value of 200 points was selected to maximize sensitivity and speci city of average scores in the ROC curve. Patients from training and validation cohorts were divided into low-risk (score < 200 points) and high-risk (score ≥  (Table 4). In the validation cohort, the predicted rates of LNM were 4.8% and 33.7% in the low-risk and high-risk groups, respectively, according to the nomogram, and 5.6% and 25.9%, respectively, according to the Mayo criteria (Table 5).

Decision curve analysis
The decision curve analysis results for the nomogram and Mayo models are shown in Supplementary   Figure 1A (training cohort) and Supplementary Figure 1B (validation cohort). For predicted probability thresholds between 0% and nearly 60%, the nomogram showed a positive net bene t for both cohorts.

Discussion
Endometrial cancer (EC) is one of the most common types of gynecologic malignancy. It is estimated that 63.4 out of every 100,000 people in China in 2015 were diagnosed as EC [12] . One of the most important prognostic factors for EC is the presence of LMN. Nevertheless, there is an ongoing debate concerning practicality of lymphadenectomy. Some studies have shown that traditional lymphadenectomy may not improve DFS or overall survival, while other studies have suggested the opposite [13][14][15] . Therefore, it is crucial to distinguish patients with low risk of LNM from those with high risk. Currently, most of the studies have followed the Mayo criteria for predicting LNM risk and compared it with different models. According to one study based on the Mayo criteria [16] , 78.9% of the studied patients were at high-risk for nodal metastasis, but the actual LMN rate was only 6.4%. Thus, almost 70% patients without LMN were over-treated. Multiple retrospective studies have shown that a low-risk subset of EC patients have a low overall risk of lymph node involvement [17][18][19] . Mariani et al. performed a large retrospective study of EC patients, which determined that patients with myometrial invasion < 50%, tumor grade 1 or 2, and tumor size < 2 cm were at a low risk for lymph node involvement [20] . Vargas et al. analyzed 19329 patients with EC and the results suggested that EC patients for which the Mayo criteria predicted a low risk of LMN, did have a low incidence of metastasis [16] . However, the scale of the patients in these studies was relatively small. Therefore, our goal was to conduct a large-cohort study to identify risk factors for LNM and develop accurate risk strati cation models to separate patients with EC into lowrisk and high-risk categories. The bene t of this model will be preventing patients from unnecessary lymph node dissection and the associated surgical morbidity.
Nomograms and clinicopathologic variables have been used by some researchers to estimate risk of metastasis. In the current study, we adopted a large-scale population-based SEER database to develop and validate a convenient nomogram for doctors to make individualized predictions of LNM in patients with EC. Data were collected for a total of 63836 EC patients and randomly divided into two groups.
Several prognostic factors were identi ed for the SEER patients in this study, including age at diagnosis, marital status, race, tumor size, histological type, myometrial invasion, cervical stromal invasion, and tumor grade. Furthermore, calibration results indicated that the predictions made using the nomogram t well with observations for both groups. We used 200 points as the cut-off value based on Youden's index [21] and divided the patients from both cohorts into low-risk (score < 200 points) and high-risk groups (score ≥ 200 points). The performance comparison of our nomogram and Mayo criteria for predicting LNM was veri ed. The nomogram showed a better discrimination than the Mayo criteria in both training (with AUC of 0.754, 95% CI, 0.72-0.83 vs. 0.716, 95% CI, 0.60-0.67; P < 0.001) and validation cohorts (with AUC of 0.751, 95% CI, 0.66-0.75, vs. 0.714, 95% CI, 0.54-0.59; P < 0.001) for the AUC values were greater for the nomogram than for the Mayo criteria. Decision curve analysis showed that the use of our nomogram had a positive bene t compared to the Mayo criteria. The lymph node metastasis rates were 4.80% and 34.0% in low-risk and high-risk groups, respectively, according to the nomogram, while the rates were 5.7% and 26.4%, respectively, according to the Mayo criteria.
Several previous studies have reported different risk-scoring nomograms for predicting LNM. So ane constructed a predictive model to identify high-risk LNM patients in early-stage type 1 EC [22] . Korean Gynecologic Oncology Group (KGOG) developed a preoperative assessment of LNM in EC that included features of MRI results and serum CA125 levels [23] . Lymph-vascular space invasion (LVSI) also has been shown to have a great impact on LNM in some studies [24,25] . However, these studies had smaller populations, and thus, their predictive values were lower than the nomogram presented here.
Some research also has concentrated on comparing different models. Gokhan et al. compare the Mayo criteria and Milwaukee risk strati cation models and found that the Mayo model was more accurate for predicting LNM than the Milwaukee model [26] . Tuomi et al. compared the performance characteristics of three risk-strati cation models, (Mayo, Helsinki and Milwaukee models) and found that these models had similar accuracies for predicting lymphatic dissemination in EC patients [27] .
The nomogram may have performed better than the Mayo model for predicting LNM due to the differences in predictive value assigned to two speci c factors in the models: age at diagnosis and cervical stromal invasion (the other four factors performed the same in both models). Age at diagnosis had only a small impact on LNM according to the nomogram algorithm, however, cervical stromal invasion was shown to be a strong predictor of LNM. A previous study from our institute also revealed that cervical stromal invasion is an independent risk factor for LNM (in addition to LVSI, tumor grade, and myometrial invasion) in EC patients and may have an important role in predicting LNM. [28] . Another study also suggested that cervical stromal invasion was useful in estimating LNM risk for EC patients and directing therapeutic strategies [29] . This may be because the degree of metastasis and invasion in a particular EC patient is a good predictive indicator of LNM. The stronger the invasive ability of cancer cells, the greater the likelihood of LNM. What's more, according to the 2014 FIGO staging system, cervical stromal invasion causes the cancer to be classi ed as stage II and the next step in the progression is local and/or regional spread of the cancer [30] .
There were several limitations in the present study. First, the study was conducted retrospectively and some selection bias may have occurred. The second limitation was that several critical prognostic factors, such as LVSI and menopausal status were unavailable in the SEER database. Third, prospective datasets need to be used to externally validate the nomogram developed in this study. Addressing these areas should be the focus of future research.

Declarations
Funding This study was funded by the grants from the National Natural Science Foundation of China (Grant No 81874108 and 81802607)

Con icts of Interest
We declare that we have no conflict of interest.        Nomogram to predict lymph node metastasis for EC patients.