Nomogram for Predicting the Overall Survival of Adult Patients With Primary Gastrointestinal Diffuse Large B Cell Lymphoma: A SEER- Based Study

Background: The aim of this study was to establish a precise prognostic model, based on significant clinical parameters, for predicting the overall survival (OS) of adult patients with primary gastrointestinal diffuse large B cell lymphoma (GI DLBCL). Materials and Methods: The data of 7,121 GI DLBCL patients, diagnosed between 1997 and 2015, were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. These patients were randomly divided into two sequential cohorts: training (n = 5,697) set and validation (n = 1,424) set. ROC methodology and calibration curves were explicitly used to evaluate the predictive performance of nomogram. Results: The median OS in the training cohort was 76 months (1–239 months), and 3, 5, and 10-year OS rates were 60.3, 53.9, and 39.5%, respectively. Age at diagnosis, Ann Arbor stage, and marital status were important clinical predictors of OS. These characteristics were used to build a nomogram. The AUC of the nomogram for predicting 3, 5, and 10-year OS were 0.669, 0.692, and 0.740, respectively. All RUC and calibration curves revealed good accuracy in predicting prognosis of GI DLBCL. Conclusion: In summary, the established nomogram was validated to predict OS for adult patients with GI DLBCL. This predictive model could help clinicians identify high-risk patients to improve their prognosis.


INTRODUCTION
The primary gastrointestinal (GI) lymphoma is the most common type of extranodal lymphomas, accounting for about 25% of all primary extranodal lymphomas (1). However, primary GI lymphoma constitutes only about 1-4% of all GI cancers (2). More than half the cases occur in the stomach, followed in small intestine and ileocecum (2). Histopathological findings reveal the following types: marginal zone lymphoma (MALT), diffuse large B-cell lymphoma (DLBCL), enteropathy-associated lymphoma (EATL), mantle cell lymphoma (MCL), and others. According to histological type, DLBCL is the most common GI lymphoma with a prevalence estimated at 40-50% (2,3). The next most common histological type is mucosa-associated lymphoid tissue (MALT) lymphoma (4). Contrary to nodal lymphomas, GI DLBCL has different clinical characteristics and prognosis. C-myc rearrangements which are more common in GI DLBCL than in nodal lymphomas, do not seem to negatively influence the prognosis (6). GI DLBCL is usually diagnosed with low or intermediate International Prognostic Index (IPI). In a retrospective analysis, patients with GI DLBCL showed better overall survival (OS) than patients with nodal or other extranodal sites (7). Nevertheless, only a few smallsample studies are conducted to search for prognostic factors because of the rareness of these tumors in recent years. Recently, Surveillance, Epidemiology, and End Results (SEER) database has been used to identify predictive factors to develop a predictive nomogram to predict the long-term survival. In this study, we use the patient records from the SEER database to establish a novel nomogram to predict the overall survival of adult patients with primary GI DLBCL.

Data Source and Study Population
Data for analysis were extracted from the SEER program of the National Cancer Institute. The SEER program statistical analysis software (SEER * Stat, Version 8.3.6) was used to examine the data for adult patients (≥18 years) diagnosed GI DLBCL between 1997 and 2015. In 1997, rituximab became the first targeted drug approved by the FDA for the treatment of B-cell NHL. The era of rituximab has arrived. The following information was obtained for each patient: age at diagnosis, sex, race, marriage, Ann Arbor stage, primary site, surgery, survival time, and status. Patients lacking these characteristics data were excluded from this study. A total of 7,121 adult GI DLBCL patients were randomly divided into two sequential cohorts: training (n = 5,697) set and validation (n = 1,424) set. Marriage of patients was recorded as married and single (never married, divorced, and widowed).

Construction and Validation of the Nomogram
The data of training cohort was used to establish the nomogram. The endpoint was OS, which was measured from the date of first diagnosis to the date of any cause of death. Survival was estimated using the Kaplan-Meier method and Cox regression analysis. The factors observed to have significant associations with OS were applied to construct the nomogram of OS.
The nomogram was internally and externally validated with 1,000 bootstrap resamples. Calibration curves were created using the marginal estimate and the model average prediction probability. ROC methodology can be explicitly used to evaluate predictive performance (8).

Statistical Analysis
The statistical analysis was performed using SPSS statistics 21 and R version 3.6.3. The bilateral p < 0.05 was regarded as significant.

Clinical Characteristics of the Patients
In general, a total of 7,121 adult GI DLBCL patients were identified from the SEER database. Patients were randomized into two sequential cohorts: training (n = 5,697) set and validation (n = 1,424) set. Patient characteristics are shown in Table 1.

OS and Significant Prognostic Factors in the Training Cohort
The median OS in the training cohort was 76 months (1-239 months), and 3, 5, and 10-year OS rates were 60.3, 53.9, and 39.5%. As shown in Figure 1, age at diagnosis, Ann Arbor stage, and marital status were important clinical predictors of OS. The results of the univariate and multivariate analysis are listed in Table 2.

Prognostic Nomogram for OS
The prognostic nomogram for 3, 5, and 10-year OS is shown in Figure 2. The OS was better for younger patients, patients with stage I disease, and married patients. With the help of the nomogram, patients were divided into different risk stratification to evaluate the OS (Figure 3).

Validation of Predictive Accuracy of the Nomogram for OS
In the validation cohort, the median OS was 74 months (1-236 months), and 3, 5, and 10-year OS rates were 59.7, 53.1, and 38.3%. The AUC of the nomogram for predicting 3, 5, and 10-year OS were 0.669, 0.692, and 0.740 (Figure 4). The internal and external calibration curves showed good optimal   agreement between prediction by nomogram and observation in the probability of 3, 5, and 10-year survival ( Figure 5).

DISCUSSION
Although primary GI DLBCL has been studied extensively in the past (9-12), its clinicopathological features are poorly described.
Most DLBCL occur in patients above the age of 60, with a slight male predominance (1). We also observed this demographic feature in the present study. More and more studies showed that primary GI DLBCL had different clinical characteristics and treatment outcomes from nodal DLBCL (13)(14)(15). Some new prognostic evaluation systems are needed to identify high-risk patients. As a mathematical model based on graphic expression, the nomogram helps to determine the possibility of clinical event by combing clinical, pathological, and biological variables. The effects of several separate variables are integrated by the nomogram to give an individualized risk estimation for each patient. Compared to the traditional prognostic system, the nomogram showed better prediction in cancer population based on the SEER database (16)(17)(18). Nomograms are increasingly used for estimating lymphoma prognosis (19,20). To our knowledge, this is the largest retrospective case series of primary GI DLBCL with the aim to get a prognostic model to  predict OS. In the present study, we developed a nomogram to predict the prognosis of patients with GI DLBCL, based on three significant factors: age at diagnosis, Ann Arbor stage, and marital status. Age is a well-known prognostic parameter for various cancers. As any other prognostic system (21-24), age at diagnosis was used as an independent prognostic factor. Refer to the age classification of NCCN-IPI (23), we divided patients into five age groups based on the data from the SEER database. Survival rates significantly decreased with increasing age at diagnosis.
Several staging systems have been developed over the past decades to improve prognostic stratification of NHL. The Ann Arbor staging system is widely used for staging of NHL. The Lugano staging system is a modification of the original Ann Arbor staging system designed for the staging of GI lymphoma. It was developed to incorporate measures of depth of invasion and distant nodal involvement. Ann Arbor classification is considered inadequate for the staging of GI DLBCL and the most widely used classification is the Lugano staging system which was adopted by the eighth edition of the Cancer Staging Manual of the American Joint Committee on Cancer (5). SEER database only provided data of Ann Arbor staging and our survival analysis showed that Ann Arbor staging was an independent prognostic index for GI Frontiers in Oncology | www.frontiersin.org DLBCL. Refer to Lugano classification, we only combined stage III and stage IV patients as one group, which is different from nodal DLBCL.
Marital status is not only a risk factor of developing cancers (25,26), but also an independent prognostic indicator of many cancer (27)(28)(29)(30). Married patients may possess relatively strong financial resources, which made it easier to get better therapies and thus was associated with better prognosis. Besides, they may also get additional care from their spouses. Le Guyader-Peyrou et al. found that marital status was independently associated with the 5-year relative survival of patients with DLBCL (31). However, socio-economic status was not associated with outcome (31). Our findings indicate that the prognosis of married patients with GI DLBCL is better than that of others. We did not have the information regarding socio-economical status that could be used for prognostication.
There are very few data illustrating the impact of IPI on primary GI DLBCL. The research results clarifying the prognostic effect of IPI on primary GI lymphoma were inconsistent. A retrospective multicenter clinical study of 299 B-cell lymphoma cases revealed IPI ≥2 to be an independent prognostic factor for worse OS (32). However, Shi et al. study of 137 patients found that there was no apparent prognostic significant correlation between IPI and survival (33). Lugano staging system was also used to modify the IPI for primary GI NHL (34,35). Patients with a stagemodified IPI ≥2 had a median survival time (MST) of 44 months (34) and our study showed an MST of 28 months for higher risk patients. However, because of lack of data in SEER database, we cannot compare the proposed nomogram with IPI directly. SEER database has many advantages with their strength primarily resting on larger sample size, inclusion of more diverse subsets of patients, and completed survival data. Our results must be interpreted carefully as there are some important limitations.
First, Due to the nature of the SEER database, many clinical, pathological, and biological information regarding individual risk factors, such as IPI and some molecular markers were not available in the SEER database. Survival analysis was limited to a few factors and could not be further refined in high-risk patients. This issue is an important one to consider in other very large clinical data sets with direct ascertainment of these factors. The second limitation of this study is its retrospective nature, data integrity, and homogeneity are not guaranteed. Nevertheless, the patient population is relatively sufficient, and the findings of prognostic factors are consistent with other studies (31,36,37). Finally, treatment regimens of included patients were unclear. We only chose the data after 1996 so that most of patients might receive rituximab therapy. The use of new targeted agents might modify the clinical outcome of GI DLBCL (38,39). Observational analyses using the SEER database can provide important hypothesis-generating data, from which future practice-changing prospective trials can be built.
In conclusion, the nomogram using age, Ann Arbor stage, and marital status permitted predictions about overall survival in adult patients with GI DLBCL. This predictive tool could help clinicians identify high-risk patients to improve their prognosis.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/).

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
JW and MZ: had full access to all of the data in the study, took responsibility for the integrity of the data, the accuracy of the data analysis, statistical analysis, and drafting of the manuscript. RZ: analysis and interpretation of data. JX and BC: supervision. All authors: concept and design.

FUNDING
This work was Supported by Nanjing Medical Science and Technique Development Foundation (YKK15068 and YKK17074).