A novel nomogram based on cardia and chemotherapy to predict postoperative overall survival of gastric cancer patients

Background: We aimed to establish and externally validate a nomogram to predict the 3- and 5-year overall survival (OS) of gastric cancer (GC) patients after surgical resection and explored the roles of cardia and chemotherapy. Methods: A total of 6543 patients diagnosed with primary GC during 2004-2016 were collected from the Surveillance, Epidemiology and End Results (SEER) database. We grouped patients diagnosed during 2004-2012 into a training set (n=4528) and those diagnosed during 2013-2016 into an external validation set (n=2015). A nomogram was constructed after univariate and multivariate analysis. Performance was evaluated by Harrell’s C-index, area under the receiver operating characteristic curve (AUC), decision curve analysis (DCA), and calibration plot. Results: The multivariate analysis identied age, race, location, tumor size, T stage, N stage, M stage, and chemotherapy as independent prognostic factors. In multivariate analysis, the hazard ratio (HR) of noncardia was 0.762 (P<0.001), and that of chemotherapy was 0.556 (P<0.001). Our nomogram was found to exhibit excellent discrimination: in the training set, Harrell’s C-index was superior to that of the 8 th American Joint Committee on Cancer (AJCC) TNM classication (0.736 vs 0.699, P<0.001); the C-index was also better in the validation set (0.748 vs 0.707, P<0.001). The AUCs for 3- and 5-year OS were 0.806 and 0.815 in the training set and 0.775 and 0.783 in the validation set, respectively. The DCA and calibration plot of the model also shows good performance. Conclusions: We established a well-designed nomogram to accurately predict the OS of primary GC patients after surgical resection. We also further conrmed the prognostic value of cardia and chemotherapy in predicting the survival rate of GC patients.


Introduction
Gastric cancer (GC) remains the fth most common cancer and the third main cause of cancer-related death, following lung cancer and colorectal cancer in both sexes [1]. More than one million people are diagnosed with GC annually, and the death toll is close to 800,000 [1]. The incidence among males is 2-to 3-fold higher than that among females (32.1 vs 13.2, per 100,000) in Eastern Asia, whereas the rate in Northern America is generally low [1].
GC can be classi ed as cardia and non-cardia, which have different epidemiology and causes [2,3]. The incidence of non-cardia GC has declined over the past 30 years; however, cardia GC rates have remained stable or even increased [2,4,5]. The poor prognosis of cardia GC compared to non-cardia has been reported [6,7], but whether cardia GC is an independent prognostic factor remains unknown.
Surgery is still the primary treatment to advanced gastric cancer (AGC), in which D2 lymphadenectomy has been widely carried out in Asia [8]. A study from Japan of the 118,367 patients after surgical resection showed the 5-year OS rate is 71.1% [9]. However, recurrence occurs in approximately 20-50% of all patients after surgery [10]. Therefore, identifying prognostic factors is indispensable in choosing treatment methods and surveillance strategies.
A nomogram is a useful predictive tool for cancer due to its accuracy, practicability, and good discrimination. It can quantify individual's survival rate in graphic form and has been used for many tumors [11][12][13]. The classic nomogram for GC is the Memorial Sloan Kettering Cancer Center (MSKCC) nomogram created in 2003 [14]. Compared with the traditional staging system-the American Joint Committee on Cancer (AJCC) TNM classi cation, a nomogram incorporates more demographic and clinicopathologic factors into the model.
The 8 th AJCC staging system was effective in 2018, but few studies have compared nomograms with this new edition. In addition, the role of chemotherapy in the prognosis of GC has been mentioned, but no nomograms have included chemotherapy as a variable to date [8,15]. Finally, most of the established nomograms for GC are complicated or internally validated, or they have a small training set [8, 10,[15][16][17]. Consequently, we aim to establish and externally validate a relatively simple, generalized nomogram to predict the overall survival (OS) of primary GC patients after surgical resection. We hope to determine the value of identifying GC as cardia versus non-cardia while exploring the role of adjuvant chemotherapy.
The performance of the nomogram is also compared with the AJCC 8 th staging system. TNM staging was recoded according to the 8 th AJCC TNM classi cation. The inclusion criteria were as follows: primary GC after surgical resection; no other malignancies; positive histology a rmation; no preoperative radiotherapy; examined lymph nodes (LNs) more than 16; and complete clinical data without missing values. The detailed enrollment process is presented in Figure 1. Types of overlapping lesions or unspeci ed lesions were excluded. Finally, a total of 6543 cases with complete clinical data were included in our study. We grouped them into a training set (n=4528) and an external validation set (n=2015) according to year at diagnosis (2004-2012 and 2013-2016, respectively). Comparisons of demographic and clinicopathologic variables between the training and validation sets were generated using the "table1" function in R software.

Construction of the Nomogram
The cutoff values of continuous variables were determined using X-tile software designed by the Yale School of Medicine and our clinical experience. We divided patients into two groups according to age (<70 or ≥70 years ) and into three groups according to tumor size (<2 cm, 2-10 cm, or ≥10 cm/diffuse). In variable of race (ethnicity), "other" included American Indian/AK Native and Asian/Paci c Islander. The SEER database classi es tumor histology (grade) into 4 groups: well differentiated (grade I), moderately differentiated (grade II), poorly differentiated (grade III), and undifferentiated/anaplastic (grade IV). We integrated poorly differentiated and undifferentiated/anaplastic tumors into a single group ("Poorly").
Location was further strati ed into cardia and non-cardia (including fundus, body, antrum and pylorus, lesser and greater curvature).
After univariate and multivariate analyses, independent prognostic factors were identi ed by the forward stepwise selection method. The proportional hazards (PH) assumption was examined before the multivariate analysis to ensure that the variables tted to the PH assumption. When P<0.1 in the univariate analysis, the variables were further analyzed with the Cox proportional hazards (PH) regression model. A nomogram was then constructed to predict 3-and 5-year OS for primary GC patients after surgery. Kaplan-Meier (KM) survival curves were constructed and compared with the log-rank test.

Nomogram Performance
Our nomogram performance was evaluated by discrimination and calibration of both the training and validation sets. Discrimination was evaluated using Harrell's C-index. The principle of the C-index has been described by Han et al [8]. The P value comparison of our nomogram with the AJCC staging system was achieved using the "compareC" function in R software. The prediction was further evaluated by the area under the receiver operating characteristic (ROC) curve (AUC) and the net bene t of decision curve analysis (DCA). Calibration was carried out by comparing the means of the nomogram-predicted survival rate with the actual OS measured by the KM method. Bootstraps were set to 1000 reiterations. Predicted total points were added as a new variable to the established nomogram in order to achieve external validation. Calibration plots of 3-and 5-year survival in the training set and 3-year survival in the validation set were constructed.

Statistical Analysis
Statistical analysis was performed using SPSS version 22.0 (SPSS, Chicago, IL, USA) and R software version 4.0.1 via rms, survival, table1, compareC, and ggplot2 packages. All tests were two-sided, and a P value<0.05 was considered statistically signi cant. This study did not require local ethics approval.

Analysis and Development of the Nomogram
Selected variables and hazard ratios (HRs) after univariate and multivariate analyses are listed in Table 2. We identi ed age, race, location, T stage, N stage, M stage, tumor size, and chemotherapy as independent prognostic factors associated with OS for GC patients. Due to a lack of signi cance, sex was excluded from the Cox PH regression model (HR: 0.973, 95% CI: 0.903-1.049).
Among the patients included in our research, HRs were found to be signi cantly higher for individuals who had the following characteristics: older than 70, male, black, cardia, poorly differentiated disease, deeper invasion, more LN metastasis, distant metastasis, larger tumor size, and no chemotherapy. Of note, after adjustment for the multivariate analysis, the HR for location was 0.762 (95% CI: 0.699-0.831, P<0.001), indicating that the non-cardia type is an independent protective factor for GC prognosis. There are two distinct discrepancies between the univariate and multivariate analyses. Although grade was statistically signi cant in the univariate analysis, it seemed to be nonsigni cant when adjusted by the multivariate model. Considering that grade represents histologic differentiation and is of clinical value, we still included it in the model.

Performance of the Nomogram
Our nomogram exhibits excellent discrimination ability. In the training set (Table 3), the C-index was 0.736 (95% CI, 0.726-0.746), which was superior to that of the 8 th AJCC TNM classi cation (C-index, 0.699; 95% CI, 0.689-0.709, P<0.001). In the validation set, the C-index was also better than that of the AJCC staging system [0.748 vs 0.707; 95% CI, (0.726-0.770) vs (0.684-0.730), P<0.001]. In addition, the ROCs of the nomogram exhibited great predictive ability in both the training and validation sets, with AUCs of 0.806 and 0.815 at 3 years and 5 years in the training set, respectively ( Figure 5A, 5B). In the validation set ( Figure 5C, 5D), the AUCs were only slightly reduced (0.775 and 0.783 for 3-and 5-year OS, respectively).
The DCA results further demonstrated the performance of our nomogram (Figure 6). Regardless of whether the training ( Figure 6A, 6C) or validation set ( Figure 6B, 6D) is used, our nomogram had a larger net bene t than the AJCC TNM classi cation. This favorable effect remains across a threshold probability of 0.05 to 0.45 for 3 years and 0.6 for 5 years.
The calibration plots also showed good agreement of the nomogram-predicted 3-and 5-year survival in the training set and 3-year survival in the validation set ( Figure 7). The 5-year curve in the validation set cannot be constructed because of the short follow-up time (patients were diagnosed during 2013-2016). The diagonal line represents the ideal situation, and we can see that the predicted survival corresponds closely with the actual OS.

Discussion
In the current study, we developed and externally validated a nomogram to predict 3-and 5-year OS for primary GC patients after surgical resection. We identi ed age, race, location, tumor size, T stage, N stage, M stage, and chemotherapy as independent prognostic factors, among which the number of metastatic LNs held the most weight. Compared with the 8 th AJCC TNM classi cation, our nomogram performed better in both the training and external validation sets. To some extent, the 8 th AJCC staging system did not perform poorly. Nevertheless, our nomogram can predict individualized survival probability more precisely.
Some nomograms classi ed GC location into the upper, middle, and lower third [8,15]. In this study, we divided all GC types into cardia and non-cardia (the survival curves of the middle third and lower third were similar in our cohort; data not shown). As a result, we found that cardia GC had a worse prognosis than non-cardia GC (P<0.001). Our nding is consistent with a systematic review that found patients with upper third GC had signi cantly increased all-cause mortality [19]. When the gastroesophageal junction (GEJ) was removed, the prognosis of pure cardia GC was even worse. Our data also show that sex was not an independent prognostic factor, which is inconsistent with previous ndings [8, 10,14,15,20]. Although males and females differ in terms of incidence rate, their prognoses appear to be similar.
Previously, Kim et al. found that age had nonlinear effects on HR [10]. Another study also found that patients older than 70 years had the lowest 5-year OS compared with younger and middle-aged patients [21]. Their results are consistent with our analysis using X-tile software, so we chose to convert age into a categorical variable at 70. Although grade is closely associated with malignant behavior and distant metastasis, it does not seem to be an independent factor affecting prognosis in our study. Therefore, when we performed multivariate analysis, the P value became insigni cant.
Another discrepancy in this study pertains to chemotherapy. Recent studies have proven that adjuvant chemotherapy after surgery could bene t patients in terms of survival probability [22]. A meta-analysis showed that compared with surgery alone, uorouracil-based postoperative adjuvant chemotherapy signi cantly reduced the mortality of GC patients [23]. Another phase III randomized controlled trial (RCT) revealed that chemotherapy using capecitabine plus oxaliplatin for half a year after D2 gastrectomy improved the 3-year disease-free survival of GC compared with surgery alone (74% vs 59%, HR: 0.56, P<0.001) [24]. The results of our multivariate analysis further demonstrated that chemotherapy acted as a protective factor against poor outcomes (Figure 3). We believe its failure to show statistical signi cance in the univariate analysis was largely due to some confounding factors. To the best of our knowledge, we are the rst to nally include chemotherapy in the nomogram construction of GC.
Similar to most previous studies, we excluded patients with fewer than 16 examined LNs [8]. This helps to ensure surgical quality and prevent the stage migration effect [8,25]. In our study, the median examined LN numbers were 23 and 24 in the training and validation sets, respectively.
Quite a few studies have used a randomly assigned (data-splitting) method to create a validation set [8,16,17]. However, this method theoretically is more of internal validation and will lead to sample wasting as well as insu cient power for evaluation. In contrast, our external validation set was established according to year at diagnosis (training set, 2004-2012; validation set, 2013-2016). In the calibration plot, the predicted survival corresponded closely with the actual 3-and 5-year OS ( Figure 6).
Notably, 655 patients had distant metastasis (M1) but underwent surgery. Among them, 58.6% (384/655) received chemotherapy, and 15.1% (99/655) received radiotherapy. A growing number of studies have shown that patients with unresectable stage IV GC can achieve good survival outcomes if they undergo radical gastrectomy after responding to several combined chemotherapy regimens [26]. This novel strategy is called conversion surgery, a treatment approach in which initially unresectable tumors become curable after chemotherapy response. If R0 resection is achieved, conversion surgery can signi cantly improve the patient survival rate [26]. We thus did not exclude such patients and hope that our nomogram can be used with these patients to predict OS after surgery. Nevertheless, this concept is still controversial, and current cancer guidelines do not recommend surgery for stage IV patients.
Regarding the surgical approach, one study revealed that distal gastrectomy had comparable long-term survival to total gastrectomy for middle and lower third GC when R0 resection was achieved [27]. Apart from conventional open surgery, laparoscopic gastrectomy has also been performed in recent years [28]. RCTs comparing laparoscopic with open distal gastrectomy showed that the former had noninferior 3year survival in both early and advanced GC [29,30]. For early gastric cancer (EGC), endoscopic treatments such as endoscopic submucosal dissection (ESD) are becoming increasingly popular, with advantages of minimized invasion and low postoperative complications [31]. Researchers in Japan analyzed 1956 patients with EGC undergoing curative ESD and found that the 5-year OS rate was 92.6% [31]. Another study predicting the risk of lymph node metastasis (LNM) in EGC after radical ESD suggested that a predicted risk of less than 3% may avoid subsequent gastrectomy, which could ultimately improve patient quality of life.
There are some striking strengths in our study. First, we used the SEER database, a standardized and relatively comprehensive database with a large sample size. Data from 2004 were collected, and more than 6000 patients were ultimately included in our study. Second, to the best of our knowledge, we are the rst to classify GC according to cardia/non-cardia in a nomogram and found good discrimination in survival outcomes. We are also the rst to nally include chemotherapy in the nomogram for GC as an independent prognostic factor. Third, our nomogram is based on the existing 8 th AJCC staging system; thus, it is widely available and highly convenient for clinical application. The performance of our nomogram was excellent, with a C-index of 0.748 in the external validation set.
Our study also has some limitations that should be noted. First, patients who did not receive chemotherapy and those with missing information were included in the SEER database, which added di culty in determining the signi cance of chemotherapy. However, the actual role of chemotherapy in patient prognosis could be underestimated because of this limitation. Second, we did not further divide T4 and N3 stages in our results because 893 cases had T4 or N3 stage but lacked speci c details. This may have sacri ced some precision but simpli ed the model.

Conclusion
In summary, we established and externally validated an elaborate nomogram to predict 3-and 5-year OS for primary GC after surgical resection. We believe that our nomogram can provide precise predictions in Western populations. Future studies are needed to further evaluate its performance and extend its applicability. Abbreviations: HR hazard ratio; CI con dence interval; Ref reference