Prognostic nomogram for adult patients with acute myeloid leukemia

Abstract Acute myeloid leukemia (AML) is hematopoietic malignancy. This study was designed to develop an individualized prognostic nomogram to predict cancer-specific survival (CSS) and overall survival (OS) of AML. The clinical data of AML patients (n = 58,882) diagnosed from 1973 to 2014 were obtained from the Surveillance, Epidemiology, and End Results database. The patients were divided into training cohort (n = 29,441) and validation cohort (n = 29,441). The prognostic nomograms were designed with clinical variables selected by multivariate Cox regression model in training cohort. The concordance index (C-index), calibration curve, and receiver operating characteristic curve were used to assess the performance of the nomograms. The predictors in nomogram for CSS were AML subtypes, age, sex, region, marital status, and chemotherapy, whereas the predictors for OS were AML subtypes, age, sex, region, race, marital status, and chemotherapy. The C-indexes of the nomograms in internal validation for CSS and OS were 0.712 and 0.703, respectively, whereas the C-indexes in external validation for CSS and OS were 0.712 and 0.705, respectively. The area under the curve of receiver operating characteristic curves for CSS and OS were 0.799 (95% confidence interval: 0.792–0.806) and 0.809 (95% confidence interval: 0.803–0.816), respectively. The individualized prognostic nomogram could perform relatively accurate prediction of outcome in adult patients with AML.


Introduction
Acute myeloid leukemia (AML) is a highly heterogeneous hematological malignant disease derived from myeloid hematopoietic progenitor cells [1] and the most common type of myeloid malignancy in adults with an incidence of 3.7 per 100,000 persons. [2] The clinical outcome of AML patients are closely related to immune, molecular, and cytogenetic abnormalities, [3][4][5] as well as age at diagnosis, sex, marital status, insurance status, and county-level income. [6][7][8] Over the past few decades, diagnosis and treatment in patients with AML has improved, but the overall survival (OS) rate for AML is still low, less than 50%. [9] Therefore, prognostic models need to be established to provide evidence for diagnosis and treatment of AML in clinic.
The nomogram models have been validated in the prognosis of several malignancies, which can provide good statistical predictions on survival probability. [10][11][12] Recent research shows that nomogram models are built to analyze OS by integrating mutated genes for older patients with AML. [13] In this study, we tried to design a nomogram model for predicting the survival probability of adult patients with AML, using the Surveillance Epidemiology and End Result (SEER) dataset between 1973 and 2014. The SEER program in the National Cancer Institute's Division of Cancer Control and Population Sciences is the most reliable and comprehensive source of population-based cancer information in the United States, which provides a large dataset for our nomogram models construction. AML subtypes, sex, age at diagnosis, region, raceethnicities, marital status, and chemotherapy in SEER program were included into the nomogram models analysis. The visual format of the nomogram helps to understand the prognosis of an individual so that their physicians can make a corresponding treatment based on the prognosis.
Program in the National Cancer Institute's Division of Cancer Control and Population Sciences. [14] The clinical data of AML patients diagnosed from 1973 to 2014 were obtained from the SEER database by using the SEER * Stat program (version 8.3.5). [14] A total of 65,535 records were obtained. In the SEER data, the AML subtypes were classified according to the 3rd edition of the International Classification of Disease Oncology (ICD-O-3) and WHO 2008 definitions. [15,16] The AML subtypes included in this study are as follows: 9840/3acute erythroid leukemia; 9861/3 -AML, NOS; 9865/3 -AML with t (6;9)(p23;q34), DEK-NUP214; 9866/3 -acute promyelocytic leukemia (AML with t (15;17)(q22;q12)) PML/RARA; 9867/3acute myelomonocytic leukemia; 9869/3 -AML. inv (3) The following cases were excluded: age at diagnosis <18 years; unknown survival time; unknown marital status; and unknown race/ethnicity. Owing to the small number of patients from Alaska, they might cause bias in survival analysis, so they were also excluded.
The following variables were analyzed: AML subtype, sex, age at diagnosis, region, race/ethnicity, marital status, chemotherapy, cause-specific death, and vital status. It is worth noting that the race/ ethnicity of yellow included Chinese, Korean, and Japanese in this study. Additionally, in marital status, married included separated, whereas single included never married or unmarried. According to the prognosis of patients, [17,18] AML was divided into APL and non-APL. The follow-up time was recorded as the duration of time from the diagnosis to death or the last day of survival information documented in the SEER registry. The variable of "vital status recode" was used to determine the status of survive.
After exclusion of patients based on the above criteria, 58,882 AML patients were identified for OS analysis. Furthermore, after excluding patients with noncancer-specific death [noncancerspecific survival (CSS)], 42,652 patients were identified as entering CSS analysis. Ultimately, patients were randomly assigned to a training cohort and a validation cohort (1:1 ratio) for OS and CSS analysis (Fig. 1). The clinical information of adult AML are publicly available in the SEER program, so the approval of local ethics committee was not needed.

Statistical analysis
Qualitative variables were categorized prior to modeling based on clinical experience and significance. For continuous variables, the optimal cutoff of age was obtained using X-tile software version 3.6.1 (Yale University, New Haven, CT). [19] Univariate and multivariate analyses were performed by using the Cox proportional hazard regression models in SPSS Statistical Package version 22.0 (IBM, Chicago, IL) to clarify the independent prognostic value of clinical variables for OS and CSS. Clinically significant variables for OS and CSS, which were selected in multivariate Cox proportional hazard regression models in a backward stepwise manner based on the Akaike information criterion, were assessed for incorporating into the nomogram model. The foreign, rms, hmisc, lattice, survival, formula, and ggplot2 packages in R, version 3.5.1 (http://www.rproject.org/) were applied for nomogram model analysis. Model performance was assessed by internal and external validation, which was performed by discrimination with concordance index (C-index) and calibration curves using 1000 sample bootstrap. Then, all cohorts of patients were given a total score using standard points obtained from the nomogram models, which could predict survival rates of AML patients. The patients were randomly assigned using the Microsoft Excel 2007. The receiver operating characteristic (ROC) curves was used for predictive ability of nomogram in SPSS Statistical Package version 22.0 (IBM, Chicago, IL). A 2-tailed P value <.05 was considered to indicate statistical significance. This study was performed in accordance with the ethical principals of the Declaration of Helsinki for medical research involving human participants. [20] 3. Results

Cohort characteristics
The clinical characteristics of the patients in the training and validation cohorts for OS and CSS analysis were listed in Table 1. 3.2. X-tile for the optimal cutoff of age X-tile software was used to determine the optimal cutoff value of age in total AML patients (n = 58,882) after screening, which was applied for univariate and multivariate Cox proportional hazard regression analysis, as well as nomogram model construction. As shown in Figure 2, the optimal cutoff of age for analysis were <62, 62-74, and >74 years, which indicated significant difference among cutoff values.

Cox regression analysis of training cohort
Univariate Cox proportional hazard regression analysis for OS and CSS suggested that there were significant differences in survival rates of AML subtypes, age, gender, region, race/ ethnicity, marital, and chemotherapy, which could be further included in multivariate Cox regression analysis ( Table 2). As shown in Table 3, multivariate Cox proportional hazard regression models demonstrated that AML subtypes, age, sex, region, race/ethnicity, marital status, and chemotherapy were independent prognostic factors of AML in the OS analysis, whereas AML subtypes, age, sex, region, marital status, and chemotherapy, except race/ethnicity, were independent prognostic factors of AML in the CSS analysis.

Nomograms of AML for CSS and OS
Clinical parameters after multivariate Cox regression selection were channeled into the construction of training cohort Chen et al. Medicine (2019) 98: 21 Medicine nomogram (Fig. 3). However, due to P < .05 of multivariate Cox regression in the CSS, race/ethnicity could not be employed in nomogram. Details of the labels for tick marks and points in nomograms were shown in Table 4.

Internal validation
The C-indexes of 1000 sample bootstrap were 0.712 and 0.703 for the CSS and OS predictive nomograms, respectively, which indicated that nomograms for CSS and OS showed relatively precise ability of discrimination. Further calibration curves manifested that the probability of predicted 1, 3, and 5-year CSS and OS in nomograms were well consistent between the predicted outcome and actual observation (Fig. 4).

External validation
In the external validation cohort, the C-indexes of predictive accuracy for CSS and OS were 0.712 and 0.705, respectively (Fig. 5). The external calibration curves also illustrated good validation between predicted and observed 1, 3, and 5-year CSS and OS. The discrimination and calibration validation of external cohort definitely certificated that nomogram models in this study could be comparatively accurate enough to predict the CSS and OS rate of patients with AML.

ROC curves for CSS and OS
The predictive ability for CSS and OS in training cohorts is by using ROC curves. The area under the curve (AUC) of ROC

Discussion
The nomogram model, compared with other predictive models, integrated different clinical variables to offer a more accurate and  personalized prognosis assessment system. [13,[21][22][23] In this work, we developed 2 nomogram models based on the SEER database to predict CSS and OS for adult patients with AML. Although the predicted and observed probabilities of 1-year OS in the nomograms were not completely consistent, the C-indexes were all higher than 0.7, which achieved considerable prediction accuracy and repeatability when nomograms were applied to training and validation cohorts. Simultaneously, 3 and 5-year CSS and OS in nomograms showed good predictive accuracy. For the nomogram of AUC, the AUC were consistent with the Cindex, indicating that the models could provide a good prognostic assessment system in patients with AML.
Here, some variables in the nomogram models were analyzed. Acute promyelocytic leukemia (APL) was generally characterized  by the t (15; 17)(q22; q21) chromosomal translocation to generate PML-RAR fusion gene, which was the target site for alltrans retinoic acid. Over the past years, due to the application of all-trans retinoic acid and arsenic trioxide (As 2 O 3 ), the clinical complete remission (CR) rate and status of the disease-free survival of APL have been significantly improved, and the CR rate has been higher than 90%. [17] However, 3-year OS rate of non-APL AML was still poor, less than 30%. [18] In the present study, the points of non-APL AML in nomograms for CSS and OS were 100 and 94, respectively, indicating that subtype of AML was a strong predictor of prognosis in nomogram models established by the AML data of the SEER program.
With the prolonged life expectancy, the incidence of AML was rising in the aging population. Over the past few decades, with the great progress making in the diagnosis and treatment of AML, the outcome of young patients has been greatly improved, but the prognosis of elderly patients (>60 years old), whose longterm OS rate is less than 10%, was still very poor. [18,24] The risk ratios of age were more than 1.7 in multivariate Cox regression and the points of age in nomograms for CSS and OS were all more than 60, suggesting that age, especially >74 years old, was a strong predictor of outcome in patients with AML.
AML is hematopoietic malignancy progressing rapidly, whose natural process is of only a few months. [2] However, 50% to 60% patients with AML could achieve CR after intensive induction chemotherapy, and the long-term OS rate after chemotherapy could be improved to 15% to 30%. [25,26] We found that patients without chemotherapy had risk ratios of more than 1.8 and the points in the nomogram were all more than 80, which played an important role in predicting the outcomes of patients.
Studies have shown that sex, region, and marital status were predictors of outcomes in AML patients, [7,27] which were consistent with our findings. However, compared with AML subtype, age, and chemotherapy, the points of sex, region, and marital status in nomogram were low, showing that the predictive ability was relatively poor.
However, it is worth noting that population-based data of SEER program usually does not include detailed clinical data such as white blood cell, [28] relapse, [29] and risk stratification, [18] which may help to improve the reliability and accuracy of the nomogram models. Hence, larger clinical data was needed to validate the accuracy and repeatability of the nomogram models in the future.
Overall, in this study, the bootstrap-corrected and ROC curvevalidated nomogram models could perform comparatively accurate prediction of 1, 3, and 5-year survival probabilities, which were clinically practical and relatively reliable in adult patients with AML. However, an independent external validation data will still be required to validate the nomogram models in the future, making the models more reliable.