External validation and comparison of IPI, R-IPI, and NCCN-IPI in diffuse large B-cell lymphoma patients treated with R-CHOP to predict 2-year progression-free survival

ABSTRACT Background Diffuse large B-cell lymphoma (DLBCL) is the most common aggressive lymphoma. The standard first-line therapy for DLBCL consists of rituximab cyclophosphamide doxorubicin vincristine and prednisone (R-CHOP). About 50–70% of patients may be cured by R-CHOP. There was no data on external validation and comparison of the international prognostic index, revised-IPI (R-IPI), and enhanced-IPI (NCCN-IPI) to predict treatment outcomes in the middle-income country with a resourced-limited setting. Objectives We aimed to externally validate and compare IPI, R-IPI, and NCCN-IPI in predicting 2-year progression-free survival (2-y PFS) of newly diagnosed DLBCL patients treated with R-CHOP. Methods This ambispective observational study recruited consecutive patients diagnosed between 1 January 2014 and 30 June 2020, with the last follow-up on 1 July 2022 from Thammasat University Hospital and Ramathibodi Hospital. We assessed discrimination by Harrell‘s concordance index (c-index), calibration by calibration plot, and absolute difference in survival (ADS) between the lowest-and the highest-risk groups. Results The cohort of 292 patients (median age 63 years and median follow-up 3.6 years) had 131 progressions and 96 deaths. The 2-y PFS was 63%. The c-indices were NCCN-IPI 0.6216, R-IPI 0.6004 (P = 0.215), and IPI 0.6104 (P = 0.463). The calibration plots of NCCN-IPI and R-IPI showed nearly perfect agreement (moderate strength), while IPI had miscalibrations. The ADSs were NCCN-IPI 52%, R-IPI 42%, and IPI 25%. Conclusion NCCN-IPI is the best prognostic index compared to IPI and R-IPI in prior studies. However, the prognostic model for DLBCL patients treated with R-CHOP requires updating or integrating biomarkers to improve discrimination to the acceptable level (c-index 0.7).


Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common aggressive lymphoma.
The three common prognostic models are the international prognostic index [2], revised-IPI (R-IPI) [3], and enhanced IPI using data from National Comprehensive Cancer Network (NCCN-IPI) [4]. These indices help discriminate between patients who are more likely or less likely to be cured by standard therapy. These three models are calculated from five variables collected at diagnosis: age, stage, lactate dehydrogenase (LDH), Eastern-cooperative oncology group performance status (ECOG), and extranodal sites. They stratify patients into risk groups to predict treatment outcomes. IPI and NCCN-IPI stratify patients into four risk groups (low, low-intermediate [LI], high-intermediate [HI], and high). R-IPI has only three risk groups (very good, good, and poor). NCCN-IPI had the largest absolute difference in survival (ADS) between lowest-vs highest-risk groups (5- [3]. The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) advocates discrimination and calibration as measures to assess model performance [5]. Discrimination is the ability to differentiate between those who do or do not experience the outcome. In the survival model, discrimination is measured and reported as a concordance index (c-index). Calibration reflects the agreement between observed and predicted outcomes and is graphically demonstrated as a calibration plot with predictions on the x-axis and the outcome on the y-axis [5,6].
The previous three external validation and comparison studies reported only discrimination and ADS but not calibration. NCCN-IPI had the highest c-index of all studies: 5-y OS in Chinese patients NCCN-IPI 0.654 and IPI 0.611 [7], OS in a population-based Nordic registry NCCN-IPI 0.67, IPI 0.65, and R-IPI 0.63 [8], and OS in multicenter clinical trials from European countries and north America NCCN-IPI 0.632, IPI 0.626, and R-IPI 0.590 [9]. NCCN-IPI also yielded highest ADS in predicting 5-y OS (NCCN-IPI43% [49% vs 92%], IPI 34% [54% vs 88%], and R-IPI 32% [61%vs93%]) [9]. The 5-y OS has long been a standard treatment endpoint that requires extended follow-up time and may delay novel drug trials [10]. The risk of progression is highest within the first two years [10,11] and those who survived 2-y progression were clinically indistinguishable from the general population [10,12]. The 2-y progression-free survival (2-y PFS) has a strongly positive correlation to 5-y OS (r 0.858) [10]. Thus, we use 2-y PFS as an endpoint.
We aimed to externally validate and compare IPI, R-IPI, and NCCN-IPI in predicting 2-year progression-free survival (2-y PFS) of newly diagnosed DLBCL patients treated with R-CHOP in a resource-limited setting.

Materials and methods
This observational cohort study ambispectively recruited consecutive patients diagnosed with de novo DLBCL, not otherwise specified according to the World Health Organization (WHO) classification [13,14]. The recruitment was conducted between 1 January 2014 and 30 June 2020 from two university hospitals (Department of Pathology, Thammasat University Hospital and Department of Pathology, Ramathibodi Hospital). The retrospective collection was from 1 January 2014 to 31 December 2017, and the prospective group was from 1 January 2018 to 30 June 2020. Eligible patients were at least 18 years old and received R-CHOP as first-line therapy. We would exclude patients who did not have ECOG, LDH, and staging at diagnosis. The final follow-up date was 1 July 2022.
Data regarding diagnosis, treatment, follow-up physical examination, laboratory findings, and radiologic findings were collected from the electronic medical databases. The date of death of those who lost to follow-up was available in civil registration, which merged with the reimbursement system.
Physical examination, bone marrow (BM) biopsy, and computed tomography (CT) or fluorodeoxyglucose (FDG) positron emission tomography (PET)-CT assessed stage and response according to Lugano criteria into complete remission (CR), partial response (PR), stable disease (SD), and progressive disease [15]. PET-CT was not available in all patients due to high cost and limited indication for reimbursement [16]. Progression-free survival (PFS) was the time from the date of diagnosis to the earliest occurrence of refractory disease, progressive disease, or death from any cause. The 2-year PFS meant alive and progression-free two years after the diagnosis. We calculated the sample size from the assumption of at least 100 to optimally 200 progressions [17], comprising 30-50% of the cohort, which would occur during follow-up. Thus, the number of participants would be 300-600 patients. Predictor variables (age, LDH, stage, ECOG, and extranodal disease) were requisite for complete-case analysis.
The survival curves were analyzed and graphed with the Kaplan-Meier (KM) method and compared by a non-parametric log-rank test. The probability of predicted risk for each patient was calculated according to the IPI, R-IPI, and NCCN-IPI.
We externally validated and compared the predictive performance of 2-y PFS of the three models by discrimination, calibration, and ADS. Discrimination by Harrell's concordance index (c-index) [18] measures the ability of the model to predict patients who would have progression earlier than those who would not. A c-index of 1 is perfect, 0.7 is acceptable, 0.6 is poor, and 0.5 is no better than a coin flip. We estimated the difference in the c-index by a linear combination [19]. The calibration plot based on the flexible parametric model and Cox regression-derived hazard demonstrates how close the predicted progression agreed with the observed progression. Subtraction of progression-free survival between the lowest-and highest-risk groups resulted in the ADS [4,9].
The human research ethics committee of Thammasat University No.1 (Faculty of Medicine) and Mahidol University (Faculty of Medicine, Ramathibodi Hospital) reviewed and approved the study to research compliance (Thammasat No. COA 055/2561, dated 25 February 2018) Ramathibodi (MURA2018/85, dated 23 March 2018) with the Declaration of Helsinki and International Conference on Harmonization of Good Clinical Practice. This observational study collected data from the electronic medical record. There was no intervention other than routine clinical practice, and no more than minimal risk was involved. Consent was waived for the study.

Results
The study enrolled 292 patients diagnosed with de novo DLBCL, NOS and treated with R-CHOP. All patients had ECOG, LDH, and staging at diagnosis, thus we did not exclude any. The median age was 63 years (range 20-95) with 57% older than 60 years. Those aged at least 80 years (n = 20) received R-mini-CHOP. Many of our patients had elevated LDH (67%), stage III/IV disease (59%), and non-germinal center Bcell type (non-GCB) (65%). Staging at diagnosis was from physical examination, CT, and BM biopsy. Response assessment imaging was from PET-CT in 56 patients (19%) and CT in the remaining. The median follow-up time was 3.6 years (range, 0.1-8.4). One patient received the first cycle of R-CHOP, had stroke, changed to palliative care, then a loss to follow-up.

Survival curve
The survival curves of all three IPIs showed a significant difference between risk groups (IPI log-rank P < .001, R-IPI P < .001, and NCCN-IPI P < .001). However, the survival curves of IPI showed overlaps between HI and highrisk groups, while that of R-IPI and NCCN-IPI risk groups were well separated ( Figure 3).

Calibration
The calibration plots of all three models lie close to or on a diagonal line at 45 degrees, especially R-IPI and NCCN-IPI, indicating nearly perfect agreement between mean predicted and mean observed 2-y progression according to risk groups. The strength of calibration of R-IPI and NCCN-IPI is moderate. The IPI showed overestimation in high-risk (0.55 vs 0.45) and low-risk (0. 25   setting (calibration slope IPI 0.969, R-IPI 0.968, and NCCN-IPI 0.972) (Figure 4).

Discussion
NCCN-IPI, IPI, and R-IPI are widely accepted and easy to use. They apply the same readily available predictive clinical factors. Old age at diagnosis is associated with unfavorable molecular features such as activated B-cell gene expression profile subtype and cytogenetic complexity [20]. Elevated LDH is a poor predictive factor of survival outcome [21]. NCCN-IPI categorized age and LDH into incremental scores to capture the increased risk of mortality based on the Cox proportional hazard (CPH) model [4], wherein IPI and R-IPI are dichotomous [2,3]. In European countries, North America, and China, NCCN-IPI was more efficient in predicting prognosis than IPI and R-IPI [4,7,8,9].
There was no external validation of the prognostic model study in DLBCL assessed and reported both discrimination and calibration according to the TRIPOD checklist. Our results represent current real-world practice in the resource-limited setting of developing    countries. The study's limitations are lack of PET-CT in staging, low availability (19%) of PET-CT in response assessment, and retrospective design in 50% of the cohort. Assessment by CT may under stage and pose confounding effect. The retrospective design could lead to selection bias. Compared to previous studies, our participants had a higher proportion of elevated LDH (67% vs 52% [7] to 59% [9]) and different outcome measures (PFS vs OS [4,[7][8][9]). Our population was similar to previous studies in age (median 63-67 years and age range 19-95 years, and proportion of age >60 years from 56 to 60%), stage III-IV (55-66%), extranodal by NCCN-IPI (38%), ECOG 2-4 (26-37%), and high-risk NCCN (4-14%) [7][8][9]. In predicting 2-y PFS, the NCCN-IPI had the highest c-index (0.6216), which was not significantly different from R-IPI (0.6004, P = .215) and IPI (0.6104, P = .463). Previous studies' c-index for OS ranged from 0.632 [9] to 0.67 [8]. The discrimination of a prognostic prediction model with a c-index of around 0.6 is poor. The 4 parsimonious model updating studies with a c-index higher than 0.7 were GELTAMO-IPI [22], Lipo-PI [23], Kang [24], and CPH [8].
CPH model incorporated the same five factors as NCCN-IPI, maintaining the linearity of continuous variables of age and normalized LDH, implemented in the interactive web page for Nordic countries (lymphomapredictor.org). CPH model increased the c-index in the prediction of OS (c-index CPH 0.73 vs NCCN-IPI 0.67) [8]. Since baseline survival of the CPH model is from the Nordic population, thus use is limited in other geographic areas.
We concluded that NCCN-IPI is the best prognostic index compared to IPI and R-IPI as prior studies. However, the prognostic model for DLBCL patients treated with R-CHOP requires updating or integrating biomarkers to improve discrimination to the acceptable level (c-index 0.7). Suggestions for future external validation in the developing country setting are recruiting participants from a multicentre study, prospective data collection, PET-CT staging and response assessment, and the addition of baselines B2M or other potential biomarkers.
Pimjai Niparuck is an Associate Professor of Hematology at Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand. Her most cited publication entitled 'Incidence of thromboembolism in patients with COVID-19: A systematic review and meta-analysis'.
Paisarn Boonsakan is an Assistant Professor of Anatomic Pathology at Ramathibodi Hospital, Mahidol University, Thailand. His most cited publication entitled 'Extranodal NK/T-cell lymphoma, nasal type, includes cases of natural killer cell and αβ, γδ, and αβ/γδ t-cell origin: A comprehensive clinicopathologic and phenotypic study'.
Prapasri Kulalert is an instructor at Department of Clinical Epidemiology, Faculty of Medicine, Thammasat University, Thailand. Her most cited article entitled 'Evaluating the impact of allergic rhinitis on quality of life among Thai students'.
Wasithep Limvorapitak is an Associate Professor of Hematology at Faculty of Medicine, Thammasat University, Thailand. His most cited publication entitled 'Outcomes of intermediate risk karyotype acute myeloid leukemia in first remission undergoing autologous stem cell transplantation compared with allogeneic stem cell transplantation and chemotherapy consolidation: A retrospective, propensity-score adjusted analysis'.
Lantarima Bhoopat is an instructor at Hematology Division, Department of Internal Medicine, Faculty of Medicine, Thammasat University, Thailand. Her most cited publication entitled 'Low vegetable intake is strongly associated with venous thromboembolism in Thai population'.
Supawee Saengboon) is an instructor at Hematology Division, Department of Internal Medicine, Faculty of Medicine, Thammasat University, Thailand. Her most cited publication entitled 'Spontaneous heparin-induced thrombocytopaenia with adrenal haemorrhage following orthopaedic surgery: a case report and literature review'.
Pichika Chantrathammachart is an instructor at Division of Hematology, Department of Medicine, at Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand. Her most cited publication entitled 'Tumor-derived tissue factor activates coagulation and enhances thrombosis in a mouse xenograft model of human pancreatic cancer'.
Teeraya Puavilai is an instructor at Division of Hematology, Department of Medicine,Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand. Her most cited publication entitled 'Extranodal NK/T-cell lymphoma, nasal type, includes cases of natural killer cell and αβ, γδ, and αβ/γδ t-cell origin: A comprehensive clinicopathologic and phenotypic study'.
Suporn Chuncharunee is an Associate Professor of Hematology at Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand. His most cited publication entitled 'Panobinostat plus bortezomib and dexamethasone versus placebo plus bortezomib and dexamethasone in patients with relapsed or relapsed and refractory multiple myeloma: a multicentre, randomised, double-blind phase 3 trial'.