Incidence and Mortality Trends and Risk Prediction Nomogram for Extranodal Diffuse Large B-Cell Lymphoma: An Analysis of the Surveillance, Epidemiology, and End Results Database

Background: DLBCL is the most commonly occurring type of non-Hodgkin's lymphoma, which may be found at various extranodal sites. But little is known about the particular trends of extranodal DLBCL. Methods: A total of 15,882 extranodal DLBCL patients were included in incidence analysis from the Surveillance, Epidemiology, and End Results (SEER) database (1973–2015). The joinpoint regression software was used to calculate the annual percent change (APC) in rates. Nomograms were established by R software to predict overall survival (OS). Results: The extranodal DLBCL incidence continued to rise at a rate of 1.6% (95% CI, 0.4–2.8, p < 0.001) per year over the study period, until it declined around 2003. The incidence-based mortality trend of extranodal DLBCL had a similar pattern, with a decrease happening around 1993. Five-year survival rates improved dramatically from the 1970s to 2010s (44.15 vs. 63.7%), and the most obvious increase occurred in DLBCL patients with primary site in the head/neck. The C-index showed a value for OS of 0.708, which validated the nomograms performed well and were able to forecast the prognosis of patients with extranodal DLBCL. The calibration curves showed satisfactory consistency between true values and predicted values for 1-, 5-, and 10-year overall survival, respectively. Conclusions: The incidence and incidence-based mortality of extranodal DLBCL had been increasing for decades, followed by a promising downward trend in recent years. These findings may help scientists identify disease-related risk factors and better manage the disease. The prediction signature cloud identifies high-risk patients who should receive effective therapies to prevent the fatal nature of this disease, and low-risk patients to reduce over-treatment.


INTRODUCTION
Non-Hodgkin lymphoma (NHL) is the tenth most common cancer in the world in 2018 (1), while the most common NHL subtype is diffuse large B-cell lymphomas (DLBCL), comprising approximately 30% of NHL (2). Lymphoma can arise in any tissue, and approximately one third of patients present with extranodal sites (3). In the United States, NHL incidence has increased about 80% from 1975 to 2003, and extranodal NHL accounts for the most of new cases (4). Etiological exposure, as well as the pattern of incidence and mortality may vary according to anatomical sites. Existing data describing the incidence and mortality trends of DLBCL failed to specially depict extranodal DLBCL, consider the effects of tumor property and demographic character of patients, or make systemic comparison of incidence and mortality trend based on these characteristics (5).
In recent years, the most of DLBCL patients' survival rate have been dramatically increased since R-CHOP (rituximab with cyclophosphamide, doxorubicin, vincristine, and prednisone) was introduced as a first-line therapy (6). However, treatmentrelated side effects and long-term complications have also been increased, and a significant proportion of patients do not respond to initial treatment or relapse ultimately. At present, the most routine prognostic system in DLBCL patients is the International Prognostic Index (IPI). The number of extranodal sites is one of the evaluation standard of the IPI score. However, Lu CS et al. had indicated that the particular extranodal sites show a better predictive value than the number of extranodal sites involved (7). Besides, there is increasing evidence that primary extranodal sites reflect distinct clinical features and prognostic implications, and require specific therapy (8,9). Therefore, a new risk stratification signature that includes extranodal sites involved of origin is needed to guide the treatment of extranodal DLBCL. As a graphical expression of a mathematical model, nomographs could combine different information of several features to forecast a specific outcome in clinical practice. By integrating various significant factors, a nomograph could estimate the feasibility of an event for each patient, such as the possibility of death or recurrence. Thus, the nomograph has evolved into an important instrument for forecasting the clinical outcomes of various type of cancer and could provide optimal therapy schemes for physicians.
The present study aimed to explore the incidence and mortality trends of extranodal DLBCL by primary site involved and patient demographic characteristics from the Surveillance, Epidemiology, and End Results (SEER) cancer registry during 1973-2015. The results may help scientists identify diseaserelated risk factors and better manage the disease. In addition, we have developed a new prediction model specifically for patients with extranodal DLBCL in the rituximab era that may provide an accurate risk stratification of individual and guide the treatment.

Data Sources
Patients to Estimate Incidence and IBM:Seer 9 The incidence cases come from the registries of the SEER-9 cancer incidence file of the US National Cancer Institute from 1973 to 2015. For the SEER database has only been recorded on the Ann Arbor Stage since 1983, we limited the analysis of the impact of the Ann Arbor Stage on the incidence from 1983.
The incidence-based mortality (IBM) cases were different from traditional mortality cases, which linked mortality records to incident cancer cases. The IBM cases were only included from 1988 to 2015 to ensure maximum majority of the deaths occurring after 1973 and IBM rates was not underestimated in the first few years. This interval was chosen, because the mean survival time for patients with DLBCL is 10 years.

Patients to Estimate Survival and Construction the Nomogram:SEER 18
The patients diagnosed from 1973 to 2015 were enrolled from the SEER 18 registries with 28% of the US population in the survival analysis.
The patients diagnosed from 2002 to 2015 were selected out from the SEER 18 registries for construction of nomogram. The patients diagnosed before 2002 were excluded, because we aimed to build a prediction model for the rituximab era and 79% of DLBCL patients have received rituximab as first-line treatment from 2002 (10).

Study Population Selection
Eligible patients were diagnosed with lymphoma with the International Classification of Diseases for Oncology, third edition, ICD-O-3 histology code 9680 (diffuse large B-cell lymphoma, not otherwise specified). These patients were excluded: (1) DLBCL wasn't primary malignancy; (2) the patients were not active follow-up; (3) the primary site of the lymphoma was central nervous system, mediastinum or unknown. Patients with primary CNS DLBCL and primary mediastinal DLBCL were excluded, because they have unique clinical features, prognosis, and the treatment that differ from other sites. The sparse number of patients whose race was unknown were excluded for further evaluation in the incidence and Incidence-Based Mortality (n = 66 and n = 7, respectively) analysis. In addition, those who survived for less than a month were excluded in survival analysis and construction the nomogram, because their survival time were recorded as 0 in SEER database. We excluded the patients whose Ann Arbor Stage and race was unknown when constructing the nomogram.

Statistical Analysis
SEER * Stat version 8.3.2 was used to calculate incidence and IBM rates. All rates were age adjusted to the 2000 US standard population and expressed per 100,000 person-years. Joinpoint regression analysis program, version 4.5.0.1, was used to analyze the incidence and mortality trends of extranodal DLBCL, and annual percentage change (APC) and average annual percentage change (AAPC) were used to assess rate changes. We use the rms package in R Bioconductor to calculate 1-year, 5and 10-year overall survival (OS). In order to determine how different variable levels were associated relatively, as well as individually, with survival, Univariable and multivariable Cox proportional hazards regression models were used to calculate hazard ratios (HR) and 95% confidence intervals (CI). SPSS used for survival analyses. P < 0.5 was a statistically significant standard.

Construction and Validation of the Nomogram
The steps to construct the nomogram are as follows. Firstly, the primary outcome was set to be OS. Secondly, variables (e.g., age at diagnosis, sex, race, clinical stage, and primary site of involvement) which may determine the outcome were selected on basis of priori clinical hypotheses. Thirdly, the survival related factors were determined via Cox proportional hazards regression model. Finally, a prognostic nomogram for OS was built through R Bioconductor based on multivariate analysis above.
As for the application of the nomogram, firstly, the point for each trait of the patient was allocated via a vertical line from the corresponding variable to the point scale. Then, all the points were summed up and a vertical line was drawn from the total point scale to obtain different probability of 1-, 5-, and 10-year OS.
Concordance index (C-index) and calibration curve are used to assess the performance of the model. C-index >0.5 is considered statistically significant, and a larger value indicates a stronger predictive ability of the model. The closer the predicted value is to the diagonal line on the calibration plot, the stronger the model prediction ability. The model experienced 1,000 bootstrap reiterations.

Patient Characteristics in Incidence and Mortality Analysis
A total of 15,882 patients with extranodal DLBCL as the first malignancy diagnosed were included in the incidence analysis from SEER database from 1973 to 2015. Patient characteristics are outlined in Table 1 (Figure 1). The incidence-based mortality trend of extranodal DLBCL had a similar pattern, with a decrease happening around 1993 ( Table 3). The annual percentage rates of incidence-based mortality during 1988-1993 was 7.2% (95% CI, 2.4-12.1%),  whereas during 1993-2015 the rate was −0.4% (95% CI, −0.8 to −0.0%]) ( Figure S1).

Trends by Sex
The

Trends by Stage
Both the incidence and incidence-based mortality of early stage patients showed an initial significant increase and then turn to decrease.  (Figure 3). In terms of IB mortality, it initially grew at a rapid rate of 4.0% (95% CI, 2.4-5.5)from 1988 to 2000, followed by declined at a rate of −1.5% (95% CI, −2.3 to −0.6) from 2000 to 2015 ( Figure S3). Therefore, the incidence-based mortality rate declined 3 years earlier than the incidence rate For advanced stage patients, the incidence and incidencebased mortality rates started from a rapid increase, with a change occurring 2007 and 1990, respectively, that the incidence rate turned to decrease and mortality rates leveled off.

Trends by Age
There was no joinpoint in the trend of incidence and incidencebased mortality in the children over the study periods, which rose at a rate of 1.5% (95% CI, 0.2-2.8) and fell at a rate of −1.6% (95% CI, −2.5 to −0.7) per year, respectively. For the AYA and the elderly, the incidence showed an initial increase, followed by declines at a rate of −1.1% (95% CI, −2.0 to −0.3) and −2.3% (95% CI, −3.0 to −1.6) around 1991 and 2003, respectively (Figure 4). The incidence-based mortality for the AYA and the elderly had a similar pattern, with a decrease happening around 1993 and 2002, respectively. For the adults, the incidence has been on the rise over study period, except for a brief decline from   2007 to 2010, whereas the incidence-based mortality has been on a downward trend since 1995 ( Figure S4). It was worth noting that the incidence in the elderly was about 4.6 times that of adults, 16.6 times that of AYA, and more than 141.9 times that of children (i.e., 9.93 vs. 2.14 cases per 100,000 in elderly vs. adults in 2003; 9.93 vs. 0.60 cases per 100,000 in elderly vs. AYA in 2003; 9.93 vs. 0.07 cases per 100,000 in elderly vs. children in 2003). The trends have similar slopes. But the increasing rate per year in AYA incidence was the fastest throughout the study period, and was about 2.4 times the rate for elderly (i.e., 4.1 vs. 1.7%).

Trends by Race
There was a continued increase in both incidence and incidence-  (Figure S5). A turn point was found in incidence for black and other patients around 2005 and 1993, respectively, but no change was found in incidence-based mortality for them.

Trends by Sites
We observed two different patterns in incidence analysis by sites ( Figure 6): (1) After an initial period of substantial and sustained growth, there has been a promising decline in recent years (including sites: head/neck, skin and soft tissue, gastrointestinal tract, genitourinary tract, and liver/pancreas); (2) The incidence almost has been on an upward trend (including sites: skeletal tissue, respiratory system, hematologic system, breast tissue, and other). The incidence-based mortality at each site followed a similar pattern, except for genitourinary tract. There was no turn point in incidence-based mortality in genitourinary tract ( Figure S6).

Survival Analysis
Overall median survival for patients with extranodal DLBCL was 93 months, with 1-, 5-, and 10-year survival rates of 75.767, 57.937, and 44.021%, respectively. Five-year survival rates improved dramatically from the 1970s to 2010s (44.15 vs. 63.7%; Figure 7A). The most obvious increase of 5-year survival rate occurred in patients with primary site in the head/neck (48.82 vs. 72.76%; Figure 7B). There was a slight, but not statistically significant improvement of 5-year survival rate in patients with primary site in the hematologic system (62.5 vs. 65.95%; Figure 7C). The change of 5-year survival rates from the 1970s to 2010s according to different primary sites were shown graphically in Figure S7.

Construction and Validation of the Nomograms
A total of 17,744 patients with complete information from the SEER-18 database were included in the construction of the nomograms from 2002 to 2015 (Table S1). Among them, most patients were man, early stage, and white. Gastrointestinal tract, head/neck and skin and soft tissue were the most common primary sites.
On multivariate analysis, male sex, older age, black race, advanced Ann Arbor Stage (III/IV), and primary sites in the skin and soft tissue, gastrointestinal tract, genitourinary tract, respiratory system, liver/pancreas, breast tissue, and other were associated with decreased survival (Table S2).
Next, the factors closely related to survival on multivariate analysis were used to construct nomograms by the R Bioconductor (Figure 8). The prognostic signature for 1-, 5-, and 10-year overall survival is demonstrated in Figure 8. The C-index and the calibration plots were powerful in assessing the performance of a model. To confirm whether the prognostic signature could predict the prognosis of patients with extranodal DLBCL, these two methods were applied. The C-index showed a value for OS of 0.708, which also validated the signature performed well and was able to forecast the prognosis of patients with extranodal DLBCL successfully. The calibration curves showed satisfactory consistency between predicted values of the model and true values for 1-, 5-, and 10-year OS (Figures 9-11).

DISCUSSION
As far as we know, this is the first ever study based on large population to uncover trend of the incidence and mortality for extranodal DLBCL by clinical features, and to systematically compare them based on these characteristics, which may help scientists identify disease-related risk factors and better manage the disease. The appropriate management of extranodal DLBCL should be able to stratify patients into distinct prognostic groups. Therefore, we have developed a new prediction model in the rituximab era that may provide an accurate risk stratification of individual to determine the best treatment options for the individual. We demonstrate that the incidence and incidencebased mortality of extranodal DLBCL have been increasing for decades, but it has shown a promising downward trend in recent years. This phenomenon may be partly explained by our survival analysis. The 5-year survival rates have improved dramatically from the 1970s to 2010s (44.15 vs. 63.7%), and the most obvious increase occurred in patients with primary site in the head/neck.
The overall mortality trend of extranodal DLBCL began to decline in 1993, which was 10 years earlier than incidence, indicating that the main reason for the decline of extranodal DLBCL mortality is the improvement of survival rate. We hypothesize that improvements in management and treatment of patients may lead to improved survival. One of the most significant improvements was the introduction of the IPI index as the gold standard for classifying high-risk and low-risk patients and guiding treatment (11). Another significant cause was the introduction of rituximab since 1988, which had the potential to cure patients (12). In addition, a deeper and more comprehensive understanding of the genetics and molecular biology of DLBCL, such as BCL2 protein expression (13) and genetic complexity (14), may also be helpful in patient management and aid in improving survival.
It is worth noting that the mortality trend of early stage patients has been declining since 2007, and the mortality rate of advanced stage patients has only stabilized from the initial rapid growth since 1993, but it has not shown a downward trend. This phenomenon may suggest that new, more effective and systemic treatments, in addition to R-CHOP schemes, are needed to prevent the fatal nature of patients at advanced stage. Chimeric antigen receptor modified T (CAR-T) cell therapy by targeting the CD19 antigen has made breakthroughs in the treatment of NHL at advanced stage, and showed the possibility of cure (15). Axicabtagene ciloleucel (16) and tisagenlecleucel (17) have been approved by the FDA. The continued monitoring of extranodal DLBCL mortality may help to assess the effectiveness of clinical approaches and aid the development of new therapeutic approaches.
We here showed the decline of primary gastrointestinal DLBCL mortality since 1993. A large number of studies have shown that Helicobacter pylori (H. pylori) and Campylobacter jejuni was closely related to primary gastrointestinal DLBCL (18)(19)(20)(21). In 1993, Wotherspoon found that there was a high incidence of H. pylori infection among patients with gastric lymphoma, and 5 out of 6 patients achieved complete remission after eradication of H. pylori infection (18,22). T Chronic enteritis and gastritis secondary to Campylobacter jejuni and H. pylori has been identified as an important predisposition to primary gastrointestinal lymphoma (23). In recent years, endoscopic ultrasonography could be used to track such patients for a long time, classify the disease stage and eradicate the infection in time (24). These advances may be associated with the large decline in the incidence of primary gastrointestinal DLBCL from 2005. In addition, the dramatic decline in the incidence and mortality of primary liver/pancreas DLBCL patients since 2001 and 2011, respectively, may be related to the following aspects: (1) the recognition that the hepatitis C Virus is a key factor in the development of primary liver DLBCL; (2) the availability of advanced methods for the effective prevention and control of the hepatitis C virus (25)(26)(27). In our analysis, the incidence and mortality of male extranodal DLBCL was twice that of women, and its annual growth rate was about 1.5 times that of women, which was consistent with previous studies (28). This was reminiscent of the fact that for most cancers, men were at high risk. Although the underlying causes were unknown (29), there were several possible reasons for this: (1) After the reporting of the initial phase 3 trials, data suggested that female responded better to rituximab than male (30)(31)(32). The speed at which rituximab is cleared in the body is the key cause of this phenomenon (33). The data from RICOVER-60 trial showed that elderly females who gained the best beneficial effects from rituximab had a statistically significant slower clearance of rituximab, which brought about longer exposure time and higher serum levels in relation to an agedependent decrease in clearance of rituximab in females (32). But elder males have a faster clearance of rituximab which leads to suboptimal dosing with rituximab when dosed at 375 mg/m². (2) The differences were in occupational exposure factors, health awareness, and lifestyle (28). For example, smoking has been shown to be a predisposing factor for non-hodgkin's lymphoma in a dose-dependent manner (34). (3) The immune response has been shown to be closely related to lymphomagenesis (35), and women had stronger humoral and cellular immunity than men (36). (4) Hormonal differences were between men and women. For example, reproductive hormone could regulate the immune response in a variety of ways (37). (5) Men were more susceptible to H. pylori infection, which was closely related to primary gastrointestinal lymphoma (38).
In our article, we showed that the incidence of extranodal DLBCL increased with age, which was consistent with the theory that the incidence increased exponentially with age in the multistep carcinogenic model of solid tumors (39). Two factors may account for this result: (1) With age, the immune system of the elderly gradually declined and the incidence of chronic inflammation gradually increased. For example, the prevalence of EBV virus increased with age. Continuous stimulation of the EBV virus leads to T-cell exhaustion that favors telomere attrition and immune senescence (40). In addition, EBV virus was an important susceptibility factor leading to extranodal DLBCL (41,42). (2) Some genes and molecules would change with age FIGURE 8 | Nomograms of patients with extranodal DLBCL for predicting overall survival. (14). For example, the elderly were more likely to have higher levels of BCL2 than younger patients (43).
The death rate for white patients has declined significantly since 2002, while that for black and other ethnic groups has remained virtually unchanged. One possible explanation for this was reported by an article showing that the black and other race people had less access to rituximab than white people in 2002, when rituximab was first used (44). Another reason may be that the number of black people and other ethnic groups in our study was too small, hindering the discovery of a significant decline in mortality. More large population-based researches were needed to explore genetic and molecular differences between extranodal DLBCL patients with different races.
We here showed that extranodal DLBCL patients with primary site in the head/neck had the most obvious increase of 5-year survival rate from the 1970s to 2010s (48.82 vs. 72.76%), which is consistent with previous studies (45)(46)(47). The possible reasons were listed as following: (1) the knowledge on extranodal lymphoma of the head and neck remained rare from 1970s to 80s (48), while many studies on such subgroup of lymphoma are being published at present considering its heterogeneous nature (47). (2) Rituximab was replenished to conventional CHOP or CHOP-like schemes for DLBCL in the late 1990s, which has led to considerable improvement in patients' survival. Furthermore, the development of novel drugs for lymphoma of distinct pathological types has resulted in a higher probability of cure (49). (3) The head and neck area which could be easily approached and assessed by palpation or endoscopy is a frequent site for tissue confirmation. Surgeons on head and neck are often consulted for biopsies and prognosis judgements of suspicious extranodal lymphomas. These may lead to the phenomenon that primary DLBCL in head and neck could be diagnosed and treated early in the clinical setting. It has been previously reported that DLBCL patients with extranodal sites in the head and neck experienced longer survival than those with nodal lymphoma. Additionally, the extranodal group also had longer disease-specific survival than the nodal group with extranodal involvement of other sites (45).
At present, the most routine prognostic system in DLBCL patients is the International Prognostic Index (IPI). However, there is multiple variables and prognosis heterogeneity in some defined risk groups according to the IPI guidelines. As a graphic expression of a mathematical model, the nomogram helps to determine the possibility of clinical event by combing biological and clinical variables. Nomograms are widely used in various types of cancers (50,51). There is increasing evidence that primary extranodal sites reflect distinct clinical features and prognostic implications, and require specific therapy (8,9). The number of extranodal sites is one of the evaluation standard of the IPI score. However, Lu CS et al. had demonstrated that some specific extranodal sites have a better predictive value than the number of extranodal sites involved (7). Therefore, primary site, as an independent prognostic factor by multivariate COX, was added to this model. The prognostic value of this model was verified by C-index and the calibration curves, which are powerful in assessing the performance of a model. The high value (0.708) of C-index for OS indicated that the model may provide the best prognostic function. The Calibration curves displayed a wonderful agreement between the predicted values and the true outcomes, which may ensure the reliability of the risk stratification signature. Compared with IPI, the new risk stratification signature could divide extranodal DLBCL patients into high risk and low risk groups more accurately, which could help decrease over-therapy in low-risk patients and promote effective therapies in highrisk patients to prevent the fatal nature of this disease. Moreover, the nomograms could compute one individual's 1-, 5-, and 10-year survival rates, respectively, and therefore may develop a more rational follow-up schedule with the patient's doctor.
Some limitations due to information availability in the SEER database should be noticed when interpreting our findings. First, information regarding therapy is limited. In SEER, there is no information about chemotherapy, which is a major modality of treatment for DLBCL. Therefore, we couldn't explore the effect of treatment on survival and the trend of incidence and mortality. We partially addressed this issue by dividing the time of diagnosis into 1970s, 1980s, 1990s, 2000s, and 2010s. Second, the incidence-based mortality rates by Ann Arbor Stage may be underestimated, because the SEER database has recorded the information of Ann Arbor Stage since 1983, which shortened the incubation period between diagnosis and death. Third, the information about radiation exposure, environmental exposures and individual life style and family history, which may be associated with the rates of incidence and mortality, was not available. Fourth, there was no record of baseline performance status, B-symptoms, bulky disease, and lactate dehydrogenase levels. Therefore, we couldn't include these potential prognostic factors in the prediction model.

CONCLUSION
Our study shows that the incidence and incidence-based mortality of extranodal DLBCL had been increasing for decades, but have shown a promising downward trend in recent years. The findings may provide new insight into better healthcare quality and better manage extranodal DLBCL. Moreover, the 5year survival rates have improved dramatically from the 1970s to 2010s (44.15 vs. 63.7%), and the most obvious increase occurred in patients with primary site in the head/neck. We also constructed the nomogram, a robust, and clinically practical risk stratification, in the rituximab era, which may help guide treatment and develop individual-specific tracking programs.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: the Surveillance, Epidemiology and End Results database.

ETHICS STATEMENT
This study was conducted in full compliance with the publication guidelines provided by SEER. The data were obtained from SEER, so the approval of an ethics committee was not needed.

AUTHOR CONTRIBUTIONS
XY collected and analyzed the data and wrote the paper. AX, FF, ZH, QC, and LZ research literature, edit the paper, and revise the paper. CS and YH conceived and designed this study, analyzed the data, and wrote the paper. All authors reviewed the paper, and approved the final manuscript.

ACKNOWLEDGMENTS
We would like to thank the researchers and study participants for their contributions.