Nomograms that predict the survival of patients with adenocarcinoma in villous adenoma of the colorectum: a SEER-based study

Considering that the knowledge of adenocarcinoma in villous adenoma of the colorectum is limited to several case reports, we designed a study to investigate independent prognostic factors and developed nomograms for predicting the survival of patients. Univariate and multivariate Cox regression analyses were used to evaluate prognostic factors. A nomogram predicting cancer-specific survival (CSS) was performed; internally and externally validated; evaluated by receiver operating characteristic (ROC) curve, C-index, and decision curve analyses; and compared to the 7th TNM stage. Patients with adenocarcinoma in villous adenoma of the colorectum had a 1-year overall survival (OS) rate of 88.3% (95% CI: 87.1–89.5%), a 3-year OS rate of 75.1% (95% CI: 73.3–77%) and a 5-year OS rate of 64.5% (95% CI: 62–67.1%). Nomograms for 1-, 3- and 5-year CSS predictions were constructed and performed better with a higher C-index than the 7th TNM staging (internal: 0.716 vs 0.663; P < 0.001; external: 0.713 vs 0.647; P < 0.001). Additionally, the nomogram showed good agreement between internal and external validation. According to DCA analysis, compared to the 7th TNM stage, the nomogram showed a greater benefit across the period of follow-up regardless of the internal cohort or external cohort. Age, race, T stage, pathologic grade, N stage, tumor size and M stage were prognostic factors for both OS and CSS. The constructed nomograms were more effective and accurate for predicting the 1-, 3- and 5-year CSS of patients with adenocarcinoma in villous adenoma than 7th TNM staging.


Background
According to global cancer statistics in 2018, colorectal cancer (CRC) is the third most common cancer, with 97, 220 new cases of colon cancer and 43,030 new cases of rectal cancer worldwide [1]. There are three pathways involved in the pathogenesis of sporadic CRC: the classic colorectal adenoma (CRA)-adenocarcinoma pathway, the de novo pathway and the inflammatory cancer pathway. Among these pathways, the adenoma-adenocarcinoma pathway is the most common mechanism for the development of CRC [2]. Adenomatous polyps account for approximately 60-70% of all colonic polyps and are divided into tubular adenomas, villous/tubulovillous adenomas (VA/TVAs), sessile serrated adenomas (SSAs) and traditional serrated adenomas (TSAs), while TSAs are often admixed with SSA and VA/TVA [3]. The pathological characteristic of villous adenoma is more than 75% of villous features with or without epithelial projections. According to previous studies, compared with other adenomas, adenomas with villous features have been considered a risk factor associated with an increased probability of developing into a more advanced neoplasia or dysplasia lesion [4]. Moreover, the size of the adenoma and the number of adenomas increase the risk of advanced development [5]. The results of a multicenter cohort study suggested that adenomas of more than 2 cm in diameter and with high-grade dysplasia were highly correlated with the development of CRC (HR: 9.25, 95% CI, 6.39-13.39) [6]. Although mounting evidence has suggested that villous adenoma is correlated with adenocarcinoma, current knowledge of the survival rate of patients with adenocarcinoma in villous adenoma is limited to a small series of studies [7][8][9][10][11]. The first report was that a 19-year-old male had carcinoma arising from a villous adenoma [12]. According to a recent case report, a 71-yearold female patient with intramucosal adenocarcinoma in villous adenoma recurred after 19 months in the ulcer scar site because of the careless pathological examination. After post-endoscopic submucosal dissection (ESD), there were no recurrent signs during 9 years of follow-up [10]. Hence, identifying prognostic factors for patients with adenocarcinoma in villous adenoma is a vital part of the assessment and therapy of CRC.
The Surveillance, Epidemiology, and End Results (SEER) program contains detailed research data on many kinds of tumors that cover almost 30% of the population in the United States [13]. Additionally, nomograms are widely used to assess the prognosis of cancers because of their ability to transform a statistical predictive model into a single numerical estimate of the probability of an event, which is a user-friendly method that guides clinical decision-making for doctors [14]. Therefore, in our study, we utilized a nomogram to analyze the impact of clinical characteristics such as TNM stage and tumor size on the survival rate of patients with adenocarcinoma in villous adenoma using the SEER database.

Data source
A total of 970,163 patients with CRC were identified from 2004 to 2015. All data were extracted from the SEER database of the United States, which covers abundant information on cancers. SEER * Stat software (version 8.3.6, downloaded from http://seer.cancer.gov/ seerstat/) was used to extract patient information from the SEER database.

Population selection
To acquire the necessary information from the databanks, we established criteria to exclude some useless data. As shown in Fig. 1, we carefully reviewed the patient information. The inclusion criteria were as follows: (1) positive pathological diagnosis; (2) sufficient information about survival; and (3) available follow-up data. The exclusion criteria were as follows: (1) pathological diagnosis not adenocarcinoma in villous adenoma (ICD-O-3 Hist/behav, malignant: 8261/3); (2) no detailed information about the specific cause of death or other cause of death; (3) no information on AJCC TNM status; (4) unknown race of patient; and (5) no record of tumor number and pathological grade. The missing value were listed in the Supplementary Table 1.

Study variables
Several variables were extracted from the SEER database, including age, race, sex, T stage, N stage, M stage, pathological grade of the tumor, number of tumors and tumor size. Patients were divided by age into < 50 years, 50-59 years, 60-69 years and > =70 years. Race was classified as black, white, and other. Pathological grade was categorized as well differentiated (grade I), moderately differentiated (grade II), poorly differentiated (grade III), and undifferentiated (anaplastic, grade IV). The T stage was divided into Tis, T1, T2, T3, T4 and TX. The N stage was described as N0 (No), N1 (Yes), N2 (Yes) and NX. For M stage, M0 indicated negative metastasis, while M1 indicated positive metastasis. Tumor size was separated into < 5 cm, > = 5 cm and unknown. The number of tumors was divided into two groups: 1 tumor or more than 1 tumor.

Statistical analysis
As described in the previous section, the demographic characteristics and clinicopathological information of the patients are summarized in Table 1. Differences in the baseline characteristics between patients who died from cancer and patients who died from other causes were assessed by the chi-square test. Overall survival (OS) and cancer-specific survival (CSS) were regarded as the primary indexes of our study. The potential factors associated with OS and CSS were analyzed by univariate and multivariate Cox regression analyses. Survival curves were obtained by the K-M method and stratified by the clinicopathological index. To perform the nomogram, first, we performed the multivariate Cox regression analysis by the "coxph" function in the "survival" package; after that, we performed the "step" function to determine the value of the Akaike Information Criterion (AIC), which is a well-known method for selecting variables; according to the AIC value, we determined the variables to build the nomogram; finally, we used the "plot" function and "nom" function in the "rms" packages to construct the nomogram model. The survival curves, ROC curves, C-index and calibration curves were calculated using the "rms", "foreign" and "survival" packages in R software (Version 3.5.0). A competing-risk model was established via the "cmprsk" package. All packages used in our manuscript were obtained from the website (https://www.r-project.org/) All results were considered to be statistically significant when the P value was less than 0.05.

Patient characteristics
As depicted in Supplementary Figure 1, according to the criteria set at the beginning of our study, we finally extracted 2813 patients who were diagnosed with adenocarcinoma in villous adenoma by histopathology from the SEER database. Table 1 lists the basic information regarding the demographic and clinical characteristics of the patients with adenocarcinoma in villous adenoma. As shown in Table 1, of the 2813 patients, 666 died from different causes, including carcinoma and other causes. Among these patients, 398 patients died from adenocarcinoma, and 268 patients died due to other causes. In the whole cohort, the six variables of age, grade, tumor size, T stage, N stage and metastasis had statistical significance in the cases of death attributed to adenocarcinoma and other causes, while no significant differences were observed for race, sex or tumor number.

Survival analysis
As shown in Fig. 1 and Table 2, overall, the patients had a 1-year OS of 88.3% (95% CI: 87.1-89.5%), 3-year OS of 75.1% (95% CI: 73.3-77%) and 5-year OS of 64.5% (95% CI: 62-67.1%). As shown in Table 2, some characteristics, such as age, TNM stage and pathological grade, suggested that advanced tumors highly affected survival, while we also found that the size and number of tumors had an effect on the prognosis of patients. The larger the tumor and the greater the number of tumors, the shorter the survival time is. In line with the results shown in Table 2, the analysis of OS by Kaplan-Meier plots revealed that age, race, pathological grade, N stage, T stage, metastasis, tumor size and tumor number were prognostic factors (Supplementary Figures 2, 3 and 4). Subsequently, we performed univariate and multivariate Cox regression analyses for OS and CSS (Tables 3 and 4).
With regard to OS, in multivariate analysis, age, race, T stage, metastasis, tumor size and tumor number were identified as prognostic factors. For example, compared to patients more than 70 years old, patients who were less than 50 years old were obviously associated with a lower mortality risk (HR: 0.175, 95% CI: 0.123-0.249). Black race, advanced T stage and M stage, larger tumor number and tumor size were also hazardous factors for survival.

Performance of the nomograms
To construct a survival prediction model, we selected CSS as the main observation and then built a nomogram plot. As listed in Table 4, patients with age > 70 years, advanced T stage, distant metastasis, positive LNM and larger tumor size (> 5 cm) and black patients had worse prognosis. To build the nomogram, race and tumor size were not included because the AIC value was obviously larger when it was added into the nomogram. Therefore, we established a nomogram based on four other prognostic factors (Fig. 2). According to the nomogram, we found that T stage contributed the most to the prognosis of AC patients, followed by M stage and age, whereas positive LNM had the least proportion for predicting survival. To explain the nomogram, a straight line can be drawn down to each time point to determine the estimated probability of survival. With respect to each predictor, we could read the points assigned on the 0-10 scale at the top and then add these points. The corresponding predictions of 1-, 3-, and 5-year risk are read last by finding the number on the "Total Points" scale.

Validation of the nomogram model
To investigate the validity of the nomogram, we divided the patients into internal and external cohorts according to the year of diagnosis (2004-2009 group and 2010-2015 group) and determined the C-index value. As listed in Table 5, the value of the C-index in the internal cohort was 0.716 (95% CI, 0.684-0.773), which was higher than the TNM stage value (C-index, 0.663, 95% CI, 0.603-0.734), suggesting that the nomogram was more effective for predicting survival than TNM stage. In line with the results of the external cohort, the nomogram was superior to TNM stage (external cohort, 0.713, 95%  (Table 5). Compared to the TNM stage model, the nomogram was better at predicting prognosis at 1 year, 3 years and 5 years (Fig. 3a-c). As indicated by the external cohort, the nomogram also performed better than TNM stage ( Table 5 and Fig. 3d-f). Furthermore, to compare the clinical usability between the nomogram and TNM stage, we performed a DCA plot. As shown in Fig. 4, in both the internal cohort and the external cohort, the predictive efficiency of the nomogram was better than that of TNM stage for 1-year, 3-year and 5-year survival.

Discussion
Colorectal adenomatous polyps are considered the main reason for the development of advanced lesions. According to current postpolypectomy surveillance guidelines, patients who have adenomas with villous elements are considered at high risk of developing advanced lesions; in addition, the size of the adenoma (> = 10 mm) would increase the risk [15]. Although colonoscopy surveillance and resection could reduce the risk of developing carcinoma, the risk of CRC after adenoma removal remains high, and the removal of adenoma does not always prevent CRC because the initial adenoma features are not well known [16,17]. Even worse is that the knowledge of adenocarcinoma in villous adenoma is still limited to case reports and several studies. According to the current case reports, tumor recurrence was frequent due to inaccurate pathological diagnoses; however, the prognosis was good if the lesion was resected entirely [10]. Moreover, the treatment strategies for adenocarcinoma in villous adenoma differ according to different clinical behaviors [18]. Hence, it is of clinical significance to accurately predict the prognosis of patients with adenocarcinoma in villous adenoma. In the present study, we analyzed the potential risk factors associated with colorectal adenocarcinoma in villous adenoma. In total, we determined 2831 patients who had detailed clinical information and assessed the clinical value of several characteristics by univariate and multivariate Cox regression analyses. In line with other reports [19,20], black patients with adenocarcinoma in villous adenoma had a poor prognosis, which was caused by multiple factors, such as diet, the microbiome composition of the bowel and healthcare access [21,22]. Similarly, age at diagnosis was an independent risk factor, which is the reason why guidelines recommend screening for CRC at 50 years old, while sex was not a prognostic factor in our study. In contrast to the findings of previous studies [19,23], pathological grade, which is known as a prognostic factor, was not identified as an independent prognostic factor for the survival of patients with adenocarcinoma in villous adenoma. Additionally, TNM stage is known to be significantly associated with the survival of patients, and we also demonstrated that it could act as an independent predictive factor. Tumor size greater than 5 cm was considered a risk factor in our study because large tumors are not sensitive to chemotherapy and are more easily invasive [24]. Regarding the number of tumors, we found that it was an independent risk factor for OS, which is consistent with the findings of a previous report [25]. However, the number of tumors was not related to CSS, which suggests that the number of tumors mainly affects the rate of death due to other causes.
Nomograms have been successfully established to predict the survival of many tumor types and are considered a more accurate model than the 7th AJCC staging system [26][27][28]. To the best of our knowledge, no nomogram has been established to predict the survival of patients with adenocarcinoma in villous adenoma. Based on the results of multivariate analysis, we constructed a nomogram to evaluate the CSS of patients using the SEER database. For the nomogram predictions of 1-, 3-and 5-year CSS, age, T stage, N stage, and M stage were included in the analysis. The C-index, which was used to estimate the correlation between the predicted probability and actual event, was 0.716 (95% CI, 0.684-0.773) in the internal cohort and 0.713 (95% CI, 0.641-0.794) in the external cohort, which indicated that the nomogram was reliable. However, race and tumor size were not used to build the nomogram plot because the AIC value was too large. AIC is considered an important criterion for variable sieving and has been used in many studies [29,30]. Moreover, according to the results of the ROC curve and DCA, the nomogram has better clinical usability than the 7th TNM staging system. Therefore, to some extent, we could evaluate the prognosis of patients by the nomogram other than TNM staging because of high reliability. According to the total score, we could determine whether patients need further chemotherapy after surgery. In that way, we could individualize the treatment of patients. In addition, we will next improve and perfect this work in a future study by collecting data for our own patients, also we will perform some experiments about adenocarcinoma in villous adenoma to investigate what differences were between adenocarcinoma in villous adenoma and conditional colorectal cancer.
Of course, our study has some limitations that should be noted. First, the TNM stage we analyzed was according to the 7th AJCC staging system, which was not the latest and may reduce the effectiveness. Then, our nomograms were constructed only by the SEER database, leading to potential selection bias. However, we developed the nomogram and verified its validity in the internal and external cohorts, which made our results more reliable. In addition, the use of AIC could make our model better by avoiding overfitting and underfitting effects. Although this nomogram performed well in the two cohorts, it should be applied with great caution when assessing the risk of 1-, 3-and 5-year survival. In the future, we will collect relevant data to incorporate the factors above into further research. Next, our manuscript has not included other characteristics, such as hematological biomarkers and molecular parameters. As some studies suggested, combining some hematological biomarkers, such as HGB, neutrophils and LDH, can promote the predictive ability of a nomogram [31], while molecular parameters, including miRNA, CpG methylation and circular RNA, have been demonstrated to be useful for predicting the survival of patients [32][33][34]. Therefore, we will improve and perfect this work in our future study by combining these characteristics.

Conclusions
In this study, we found that age at diagnosis, tumor size, T stage, N stage, race and M stage were identified as risk factors for CSS in our patient sample. In addition, we constructed nomograms to predict the survival of patients and found that compared to 7th TNM staging, the nomograms could serve as a good and effective tool for survival evaluation by calculating calibration plots and ROC curves.