Development and validation of prognostic nomograms for early-onset locally advanced colon cancer

Background: The incidence of colorectal cancer in patients younger than 50 years has been increasing in recent years. Objective: Develop and validate prognostic nomograms predicting overall survival (OS) and cancer-specific survival (CSS) for early-onset locally advanced colon cancer (EOLACC) based on the Surveillance, Epidemiology, and End Results (SEER) database. Results: The entire cohort comprised 13,755 patients with EOLACC. The nomogram predicting OS for EOLACC displayed that T stage contributed the most to prognosis, followed by N stage, regional nodes examined (RNE) and surgery. The nomogram predicting CSS for EOLACC demonstrated similar results. Various methods identified the discriminating superiority of the nomograms. X-tile software was used to classify patients into high-risk, medium-risk, and low-risk according to the risk score of the nomograms. The risk stratification effectively avoided the survival paradox. Conclusions: We established and validated nomograms for predicting OS and CSS based on a national cohort of almost 13,000 EOLACC patients. The nomograms could effectively solve the issue of survival paradox of the AJCC staging system and be an excellent tool to integrate the clinical characteristics to guide the therapeutic choice for EOLACC patients. Methods: Nomograms were constructed based on the SEER database and the Cox regression model.

AGING well known [4]. Therefore, early-onset colorectal cancer should gain more attention.
Colon cancer accounts for the vast majority of colorectal cancer, around 70% [3,5,6]. Although colorectal cancer is usually discussed as a general category, there are many differences, involving embryological origin, anatomy, function as well as treatments, between colon cancer and rectal cancer [7].
In addition, numerous studies tend to put patients with stage II and stage III colon cancer together in exploring prognostic information [8,9] due to the relatively consistent treatment strategies and follow-up principles. Therefore, this study focused on locally advanced colon cancer patients younger than 50 years (early-onset locally advanced colon cancer; EOLACC).
The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) is a source for epidemiologic information on the incidence and survival rates of cancer in the United States [10]. Various studies have explored clinical problems by analyzing the data from the SEER database, which has helped to further improve the treatment of cancer patients. Although widely used to evaluate the prognosis of various tumors, the American Joint Committee on Cancer (AJCC) staging system contains a survival paradox for locally advanced colon cancer, that colon cancer patients with T3-4N0 (stage II) had an approximate or even worse survival rate compared to those with T1-2N+ (stage III) [11][12][13]. The shortcomings of AJCC staging for colon cancer prompted the exploration of a new risk scoring system. The nomogram is widely applied to predict outcomes intuitively and effectively in medical studies. The length of the line in the nomogram can be used to indicate the impact of each variable on the outcome. Therefore, our plan was to develop and validate prognostic nomograms predicting overall survival (OS) and cancer-specific survival (CSS) for early-onset locally advanced colon cancer based on the SEER database.

Patient characteristics
The entire cohort from the SEER database comprised 13,755 patients with histologically confirmed locally advanced colon cancer, who were younger than 50year-old; these were distributed into a training group or a verification group randomly according to the ratio of 2:1. Table 1 summarizes the demographic, clinical as well as pathological characteristics of the study cohort.
The target population was mainly 40-49 years old (74.30%) and White (72.32%). Male patients had a slightly predominance compared with females (52.05% vs. 47.95%) in EOLACC. Meanwhile, early-onset patients with locally advanced left colon cancer were slightly more than those with locally advanced right colon cancer (50.69% vs. 47.29%). Moreover, mucinous cell carcinoma (MCC)/signet ring cell carcinoma (SRCC) accounted for 13.41% of cases in this study. Besides, the proportion of stage III colon cancer (N+: N1 and N2) was higher than that of stage II (N0) (57.72% vs. 42.28%). Almost all of the patients (99.00%) had undergone colectomy and 64.18% of them received chemotherapy. More importantly, the patients with RNE ≥ 12 totaled 86.75%. In addition, 3.56% received radiotherapy, which is not a conventional treatment for colon cancer. Table 2 summarizes the characteristics of EOLACC patients from the external verification group, which comprised 126 patients from China. All of the patients in the external verification group had undergone colectomy. And 66.67% of those from our institute received chemotherapy.

Screening independent prognostic factors
The independent prognostic factors affecting OS and CSS were differentiated by univariable and multivariable Cox regression models. The qualified factors in the univariate analysis were brought into the Cox regression model for multivariate analysis. OS was significantly associated with 10 features, including marital status, race, gender, pathological grade, T stage, N stage, surgery, chemotherapy, RNE and carcinoembryonic antigen (CEA) ( Table 3). CSS was related to 9 variables (i.e. marital status, race, pathological grade, T stage, N stage, surgery, chemotherapy, RNE and CEA) ( Table 4).

Development and verification of prognostic nomograms
Based on the results of the multivariable Cox regression models, the nomograms predicting 3-, 5-and 10-year OS and CSS were created with the independent prognostic factors. By adding up the scores related to each variable and projecting total scores to the bottom scales, it is easy to calculate the estimated 3-, 5-, and 10-year OS and CSS probabilities.
The nomogram predicting OS and CSS for EOLACC displayed that T stage contributed the most to prognosis, followed by N stage, RNE and surgery ( Figures 1A and 2A). Various methods were then performed to identify the discriminating superiority of the nomogram. The C-indexes of the nomogram for the     Table 5). The calibration curves showed no obvious deviations from the reference line, which displayed an optimal agreement between actual observations and model prediction for 3-, 5-, 10-year OS ( Figure 1B, 1E

Performance of the nomograms in stratifying on the basis of risk points
X-tile software was utilized to classify patients as highrisk, medium-risk, and low-risk, according to the risk scores of the nomograms. The cut-off values were 133 and 221 for OS ( Figure 3A), 130 and 200 for CSS ( Figure  3B). The survival curves in the survival paradox of the AJCC staging system for colon cancer display that patients with T3-4N0 had a similar survival to those with T1-2N+ (OS: p=0.975, Figure 4A; CSS: p=0.709, Figure  4F). Figure 4B and 4G show the correspondence between AJCC stage and the risk stratification in this study.
The risk stratification effectively avoided the survival paradox in this study. In the training cohort, the low-  Figure 4D and 4I) in the verification cohort. The risk stratification system was also applicable to patients from our institution (OS: 78.79% in the lowrisk group; 65.18% in the medium-risk group; 0.00% in the high-risk group; CSS: 74.90% in the low-risk group; 61.89% in the medium-risk group; 0.00% in the highrisk group) ( Figure 4E and 4J).

DISCUSSION
It is well-established that the vast majority of colon cancer occurs in patients over 50 years old. Colon cancer screening, therefore, begins in an average-risk population aged ≥50 years old [14][15][16]. Meanwhile, numerous studies have focused on colon cancer as a whole or even on elderly patients with colon cancer resulting in the fact that the current treatment strategies are tailored for late-onset colon cancer (in patients >50years-old). However, early-onset colon cancer is epidemiologically, pathologically, biologically and metabolically different from late-onset colon cancer [17]. There is a current clinical unmet need regarding the diagnostic and therapeutic protocols that should be dedicated to young individuals with colon cancer.
Although widely used to evaluate the prognosis of various tumors, the AJCC staging system contains a survival paradox for locally advanced colon cancer, in that colon cancer patients with T3-4N0 (stage II) possess a similar or even worse survival compared to those with T1-2N+ (stage III) [11][12][13]. The survival paradox confirms that the AJCC staging system is inaccurate and insufficient for the medical demands related to locally advanced colon cancer. In fact, the root cause of the survival paradox is that T stage contributes more to prognosis than N stage, as the nomograms show. The risk stratification based on the points of the nomograms effectively avoids the survival AGING  paradox. Besides, the time-dependent ROC curve clearly shows that the nomograms possess superior sensitivity and specificity. The DCA curves indicate the comprehensive nomograms are conducive to making better clinical decisions in individual treatment compared to each independent predictor. Therefore, the survival nomograms for locally advanced colon cancer patients younger than 50years based on the SEER database are able to accurately evaluate OS and CSS of EOLACC patients and effectively solve the issue of the survival paradox.
Radical resection is the first-choice treatment for locally advanced colon cancer [18,19]. Both nomograms predicting OS and CSS indicated the tremendous survival advantage of colectomy. Meanwhile, RNE was considered as the priority for the assessment of the quality of surgery [18,20]. In fact, previous research identified RNE as an important prognostic factor [21,22]. The general consensus exists that the postoperative specimens of radical operations for colon cancer should contain at least 12 regional lymph nodes, as recommended by the National Comprehensive Cancer Network (NCCN) guidelines [2]. However, previous research indicated that young patients with colorectal cancer suffered a higher risk of lymph node metastasis [23]. Is a minimum of 12 RNE adequate for EOLACC?
The nomograms demonstrated that 30-35 RNE was the optimal option. Therefore, expanding lymph node dissection may be a more reasonable option for EOLACC patients.
Early-onset colon cancer patients were 2 to 4 times more likely to receive systemic chemotherapy, especially in multiagent irinotecan-based or oxaliplatinbased regimens, than late-onset patients in each disease stage [24]. However, the more intense chemotherapy did not provide young individuals with survival benefits comparable to those in late-onset colon cancer [24]. The mismatch between tumor treatment management and relative survival highlights the possibility of overtreatment and the increased risk of chemotherapyrelated toxicity for early-onset colon cancer patients. Similarly, Manjelievskaia believed that the addition of systemic chemotherapy cannot offer the same survival improvement for early-onset colon cancer [4]. The nomograms confirmed that chemotherapy, which played an independent prognostic factor, contributed very little to improve OS and CSS of EOLACC in this study. Therefore, avoiding excessive chemotherapy for young colon cancer patients is the most notable finding.
Patients with EOLACC were classified as high-risk, medium-risk, or low-risk according to the risk score of the nomograms in our study, which could provide a reference for EOLACC patients with respect to receiving chemotherapy or not.
Can the early-onset patients (< 50 years old) with locally advanced colon cancer be analyzed as a whole? This study divided the entire cohort into three sub-group according to age, including 18-29 years old, 30-39 years old and 40-49 years old. There is no significant difference in OS or CSS among the three subgroups in the COX regression analysis. Therefore, this study believed that it was reasonable to classify early-onset colon cancer as a whole, as many studies have done [4,6,[24][25][26]. A large body of studies reported that the survival of colon cancer was related to the primary tumor location [27]. However, the primary tumor location cannot be used as an independent prognostic factor in EOLACC patients. The current treatment strategies, including surgery and chemotherapy, may bring approximate survival benefits for right colon cancer and left colon cancer.
To the best of our knowledge, this study was the first to create and validate survival nomograms for EOLACC based on the SEER database. The previous nomograms [19,21,28], mainly addressing elderly patients, are not suitable for early-onset colon cancer patients owing to the unequal contribution of each prognostic factor, especially chemotherapy and RNE. Our nomograms focused on EOLACC and were verified by the external information. However, there were some limitations in our study. Firstly, as a retrospective study, the nomograms still need to be validated in the future by prospective studies. Secondarily, detailed treatment information for included patients were not recorded in the SEER cohort, and we could not investigate specific options, including chemotherapy regimens and specific surgical methods, etc., in the survival of EOLACC patients. Lastly, the nomograms need to be verified by AGING  more data since the sample size of the external verification group was small.
In conclusion, we established and validated nomograms for predicting OS and CSS based on a national cohort of almost 13,000 patients with EOLACC. The nomograms could effectively solve the survival paradox of the AJCC staging system and be an excellent tool to integrate clinical characteristics to guide the therapeutic choice for EOLACC patients.

Data sources
The clinicopathological data of all EOLACC patients were retrieved from the SEER program. The SEER Program of the National Cancer Institute is an authoritative source of information on cancer incidence and survival in the United States that is updated annually . The target population was  limited to patients who were older than 18 and younger  than 50, with Stage II and III colon adenocarcinoma (ICD-O-3: 8140, 8144, 8201, 8210, 8211, 8220, 8221, 8255,  8260, 8261, 8262, 8263, 8323, 8440, 8460, 8470, 8472,  8480, 8481, 8490), 14,056 patients in total. According to CS extension (http://web2.facs.org/cstage0205/colon/ Colon_bao.html), T stage was re-classified to align with the 8 th AJCC staging system. Exclusion criteria: diagnosed at autopsy or death certificate (n=6); survival months is 0 (n=246); without Positive histology (n=7); missing detail information for transforming to 8th AJCC staging (n=42). The final study sample contained 13,755 patients with early-onset locally advanced colon cancer (T3-4 and/or N+) ( Figure 5).

AGING
For each patient, the following demographic, clinical, pathological and therapeutic variables were acquired: gender, age at diagnosis, race/ethnicity, marital status, tumor size, tumor location, pathological grade, histological type, T stage, N stage, surgery, chemotherapy, radiotherapy, regional nodes examined (RNE), CEA and follow-up information. All qualified patients were randomly divided into two cohorts at the ratio of 2:1 (training cohort, n =9170, and validation cohort, n =4585).
126 EOLACC patients from the Department of Gastrointestinal Surgery of Xiangya Hospital, Central South University (Changsha, China) served as the external verification group. The admission time of these patients was from January 1, 2009 to July 31, 2019. The termination of follow-up was July 31, 2020, in this study. Patients with missing follow-up data were excluded.

Statistical analysis
A 95% confidence interval (CI) and a hazard ratio (HR) were calculated by Cox regression models. The potential prognostic factors with significant differences in the univariate Cox regression analysis were incorporated into multivariate analysis. Then, nomograms were constructed and assessed to predict 3-, 5-, and 10-year survival rates, including OS and CSS, in EOLACC patients by means of R software based on the multivariate analysis results. The distinguishing ability of the novel nomograms was verified by various methods, involving the concordance index (C-index), time-dependent receiver operating characteristic (ROC) curve and the value of the area under the ROC curve (AUC). The calibration curves were plotted to compare the nomogram-predicted survival with the actual survival. The decision curve analysis (DCA) was performed to determine the clinical usefulness by quantifying the net benefits at different threshold probabilities.

Ethics approval
Approval from the ethical board for this study was not required because of the public nature of all the data.

AUTHOR CONTRIBUTIONS
Yuqiang Li, Wenxue Liu and Fengbo Tan conceived and designed the study. Yuqiang Li and Wenxue Liu wrote the article. Lilan Zhao downloaded and screened the data from SEER database. Zhongyi Zhou, Heming Ge and Qian Pei collected the data from our institute using for external verification. All authors participated in analyzing the data. All authors read and approved the final manuscript.