Development and validation of prognostic nomogram for germ cell testicular cancer patients

The purpose of our study was to establish a reliable and practical nomogram based on significant clinical factors to predict the overall survival (OS) and cancer-specific survival (CSS) of patients with germ cell testicular cancer (GCTC). Patients diagnosed with GCTC between 2004 and 2015 were obtained from the SEER database. Nomograms were constructed using the R software to predict the OS and CSS probabilities and the constructed nomograms were validated and calibrated. A total of 22,165 GCTC patients were enrolled in the study, including the training cohort (15,515 patients) and the validation cohort (6,650 patients). In the training cohort, multivariate Cox regression showed that age, race, AJCC stage, SEER stage and surgery were independent prognostic factors for OS, while age, race, AJCC stage, TM stage, SEER stage and radiotherapy were independent prognostic factors for CSS. Based on the above Cox regression results, we constructed prognostic nomograms of OS and CSS in GCTC patients and found that the OS nomograms had higher C-index and AUC compared to TNM stage in the training and validation cohorts. In addition, in the training and external validation cohorts, the calibration curves showed a good consistency between the predicted and actual 3-, 5- and 10-year OS and CSS rates of the nomogram. The current prognostic nomogram can provide a personalized risk assessment for the survival of GCTC patients.


INTRODUCTION
Testicular cancer (TC) is a rare malignant tumor in the genitourinary system, accounting for about 5% of genitourinary tumors [1]. In 2018, 71,105 new cases of TC (1.7% of male incidence) and 9,507 deaths (0.2% of male mortality) were diagnosed worldwide [2]. Despite the lower overall incidence, TC is the most common malignant tumor in men aged 15-34 years [3].
Germ cell TC (GCTC) is the most common type of TC. GCTC mostly occurs on one side and only about 1% on both sides [4]. The main risk factor for GCTC is cryptorchidism, which occurs in 2-5% of boys born at term [5]. The other risk factors include gonadal dysgenesis and genetic diseases such as Down's syndrome [6,7]. Although the overall 10-year cancerspecific mortality (CSM)-free survival rate in patients with GCTC is approaching 95%, the incidence has increased significantly over the past 30 years [8,9]. Therefore, further research is important to determine the predictors that may affect the long-term survival of GCTC patients.
The American Joint Committee on Cancer (AJCC) tumor node metastasis (TNM) staging system is widely used to AGING evaluate the prognosis of patients with GCTC. However, some other factors such as age, race, SEER stage, surgery and radiotherapy can also affect the outcome of GCTC patients. The nomogram based on the equations derived from the regression coefficients of each variable integrates many prognostic factors, which can more accurately predict the individual survival [10]. The nomogram can incorporate important clinicopathological and demographic variables in clinical practice to create a more comprehensive prognostic evaluation system.
In this study, we analyzed the clinicopathological features and prognostic factors of GCTC patients using the Surveillance, Epidemiology, and End Results (SEER) database. Based on the results of survival analysis, we further developed and validated the prognostic nomogram for patients with GCTC to better predict the patient's prognosis.

Demographic and clinicopathologic characteristics
From 2004 to 2015, our study cohort included 22,165 eligible GCTC patients, including 15,515 patients in the training cohort and 6,650 patients in the validation cohort. Table 1 shows the demographic and clinical characteristics of patients with GCTC. In the entire cohort, the majority of GCTC patients were white (90.3%), and the age of onset was concentrated between 21-40 years (65.8%). The most common types were AJCC I stage (77.2%), T1 stage (66.9%), N0 stage (80.3%), M0 stage (90.8%) and localized stage (72.2%). In addition, nearly 99.9% of patients received surgery while only 20.2% received radiotherapy.

Survival of patients with GCTC
By analyzing the Kaplan-Meier curve with a log-rank test we found that age at diagnosis, race, AJCC stage, T stage, N stage, M stage, SEER stage, surgery and radiotherapy (All p<0.05) were associated with OS and CSS of GCTC patients (Table 2). In the whole cohort, the 3 -, 5-and 10-year OS of GCTC patients were 96.4%, 95.6% and 93.6%, respectively, and the 3 -, 5and 10-year CSS were 97.7%, 97.4% and 97.2%, respectively. We found that patients aged 21-40 and received chemotherapy had a higher survival rate.

Identification of prognostic factors of OS and CSS in GCTC patients
Univariate and multivariate Cox regression were used to analyze the related factors of OS and CSS in patients with GCTC. In the training cohort, univariate Cox regression analysis showed that age at diagnosis, race, AJCC stage, T stage, N stage, M stage, SEER stage, surgery and radiotherapy were related factors of OS and CSS in GCTC patients. After all the above factors were included in the multivariate Cox regression analysis, we found that T stage, N stage, M stage and radiotherapy were not independent risk factors for OS, while N stage and surgery were not independent risk factors for CSS (Table 3).

Prognostic nomograms for OS and CSS
In the training cohort, we developed and established two nomograms for OS and CSS: one nomogram of independent risk factors associated with prognosis based on the multivariate Cox regression analysis, and the other one nomogram based on TNM stage. The length of the line corresponding to each variable in the nomogram represents the influence of the predictive variable on the survival outcome. We found that for nomogram generated by multivariate Cox regression analysis, age contributed the least to survival outcome in the OS nomogram, and SEER stage has the greatest contribution to the survival outcome in the nomogram of CSS, followed by age ( Figure 1). Regardless of the OS or CSS nomogram generated by T stage, N stage and M stage, M stage made the greatest contribution to survival outcome.

Validation and calibration of the nomograms
Analysis of the time-dependent ROC curves for OS shows that the AUC for the ROC curve of the nomogram (training cohort: AUC=0.763; validation cohort: AUC=0.765) was significantly larger than that of TNM stage (training cohort: AUC=0.717; validation cohort: AUC=0.734), but the ROC curve of CSS was similar ( Figure 2). Moreover, we evaluated the predictive performance of the nomogram for 3-, 5-and 10-year OS and CSS in the training and validation cohorts and found that the nomograms provided a good assessment of OS and CSS at 3-, 5-and 10-year in GCTC patients (Figure 3 and Supplementary Figure 2).
In order to compare the predicted survival time with the actual survival time, the C-index and calibration curves were used to verify the nomogram in the training and validation cohorts. We found that the C-index of the nomogram OS was larger than that of the TNM stage AGING   both training and validation cohorts. Both in training cohort and validation cohort, the CIC results show that the multivariate Cox regression analysis produced nomograms that were classified as positive among the broad thresholds for OS, and the number of true positives was greater than the TNM stage nomograms ( Figure 6 and Supplementary Figure 4).

DISCUSSION
In this study, we first established prognostic nomograms of OS and CSS in patients with GCTC. We performed Cox regression analysis on a large number of GCTC patients using the SEER database and found that age at diagnosis, race, AJCC stage and SEER stage were AGING independent risk factors for OS and CSS. We constructed two prognostic nomograms: one based on multivariate Cox regression analysis and the other one TNM stage.
By examining the C-index, ROC curve, DCA curve and CIC we found that the nomogram based on multivariate Cox regression analysis has better OS prognosis than the TNM stage nomogram and has similar prognostic ability in CSS. In addition, we have verified and calibrated the established nomograms and have evaluated the accuracy of the OS and CSS alignment charts for 3-, 5and 10-year. The results show that there was a good consistency between the prediction of the nomograms and the actual observation and also good reliability in both internal and external verification. TNM staging classification system was the most versatile tumor staging system in the world and also the foundation of GCTC prognosis [11]. The TNM stage was determined based on the results of laboratory tests and postoperative pathological examinations [12]. In this classification system, clinicians determine TNM stage based on the depth of tumor invasion (T), number  AGING of lymph node metastasis (N) and distant metastasis (M). For cancer patients with different TNM stage high stage means complex drug treatment and short survival time. How to better combine the patients' tumor characteristics and their own clinical factors to make a tailored assessment of the risk of patients has been a challenge for clinicians [13].
The nomogram was a predictive tool, which was a graphical representation based on multivariate prognostic regression analysis, making the prognostic factors more visual [14,15]. The model integrates variety of prognostic factors and is well prepared to evaluate the survival probability of individual patients [16]. At present, many cancer nomograms have been  AGING developed and shown more accurate predictions of the cancer prognosis than traditional TNM systems [17,18]. In addition, nomograms allow clinicians to incorporate more prognostic factors and assess the patient's physical condition more intuitively in order to evaluate the personalized prediction for clinical trial participation. Therefore, it was of great significance to establish an effective and reliable nomogram for the prognosis of patients with GCTC and to provide them with individualized treatment.
The nomogram has been widely used in various urinary malignancies, which was of great significance for individualized and accurate prediction of prognosis [19][20][21]. Karakiewicz et al. [22] performed preoperative prediction of 726 patients treated with radical cystectomy and bilateral pelvic lymphadenectomy, and found that the multivariate nomogram was more accurate than the TUR T stage alone prediction. Similarly, Kattan et al. [23] constructed a nomogram that included pre-treatment serum prostate-specific antigen levels, biopsy Gleason scores and clinical stages, and found that it could predict the 5-year treatment failure probability with clinically localized prostate cancer who underwent radical prostatectomy. Zhou et al. [24] found that nomogram and Aggtrmmns scoring system can effectively predict kidney cancer patient's OS and CSS. In our study, we developed a nomogram based on age, race, AJCC stage, TNM stage, SEER stage, surgery and radiotherapy variables, and the nomogram showed better ability to predict the prognosis than the TNM stage nomogram. Using this nomogram, urologists can evaluate the prognostic survival of patients with GCTC, enabling personalized treatment and monitoring possible. There are limitations to be recognized in this study. First, this study was a retrospective study with limitations, sample and ethnic selection bias and more cases needed for prospective studies. Second, the SEER database has certain limitations regarding type/duration of treatment and recurrence of disease and we cannot obtain detailed specific information (dose, beam energy and fractionation) of radiotherapy. Moreover, the information on the patient's physical condition and complications is lacking, both of which are prognostic factors for patients with GCTC.
Based on a large number of population data, we developed prognostic nomogram for GCTC patients, which can accurately and reliably predict the 3-, 5-and

Patients selection
The data presented in our study were retrieved from the Surveillance Epidemiology and End Results (SEER) database, which funded by the National Cancer Institute.

Statistical analysis
We randomly assigned 70% of patients to the training cohort (n=15,515) and the remaining 30% to the validation cohort (n=6,650). Kaplan-Meier curve was used to estimate the OS and CSS of GCTC, and the difference between the curves was analyzed by log-rank test. Univariate and multivariate Cox regression models were performed to estimate the hazard ratios (HR) and 95% confidence intervals (CI) to analyze independent prognostic factors of GCTC.
Using R software, we constructed two nomograms: one nomogram the multivariate Cox regression analysis, and the other one nomogram based on TNM stage, to predict the OS and CSS probabilities of individual patients. We first used the R software to generate the receiver operating characteristic (ROC) curve for the two nomograms and determined the area under the curve (AUC). In addition, by comparing the predicted survival time with the observed survival time, the predictive performance of the nomogram was evaluated using the consistency index (C-index) and calibration curve, and the nomogram was calibrated for 3-, 5-and 10-years OS and CSS. The C index was similar to the AUC, but seems to be more suitable for censored data. The value of the C-index statistic was between 0.5 (nondiscrimination) and 1 (perfect discrimination), and a higher C-index value indicates a better prognostic model. These evaluations were performed using a bootstrap with 1000 resamples.
There was no direct clinical interpretation for C-index. Therefore, we also analyzed the decision curve analysis (DCA), which is a novel method to evaluate the predictive model for evaluating net benefits from the perspective of clinical outcome, and plotted the clinical impact curve (CIC) based on the results of DCA.

Research involving Human Participants and/or Animals
This article does not contain any studies with human participants or animals performed by any of the authors.