Transformation Scoring System (TSS): A new assessment index for clinical transformation of follicular lymphoma

Abstract Although histologic analysis is the gold standard for diagnosing follicular lymphoma (FL) transformation, many patients are diagnosed with transformation by clinical factors as biopsy specimens often cannot be obtained. Despite the frequency of clinical diagnosis, no clinical assessment tool has yet been established for FL transformation in the rituximab era. We derived and validated a transformation scoring system (TSS) based on retrospective analyses of 126 patients with biopsy‐proven FL and histologic transformation (HT) at two hospitals of the National Cancer Center of Japan. In the derivation set (76 patients), the detailed analyses of the clinical characteristics at disease progression showed that lactate dehydrogenase (LDH) elevation, focal lymph nodal (LN) enlargement, hemoglobin <12 g/dl, and poor performance status (PS) (2‐4) were associated with HT. The weights of these variables were decided based on the regression coefficients. Next, we constructed a TSS encompassing the above four factors: LDH, (> upper limit of normal [ULN], ≤ULN ×2) (1 point), (≥ULN ×2) (2 points); focal LN enlargement, (≥3 cm, <7 cm) (1 point), (≥7 cm) (2 points); hemoglobin <12 g/dl (1 point); poor PS (2 points). We identified a high positive predictive value (PPV) (96.4%) and negative predictive value (NPV) (85.4%) for diagnosing HT when a cutoff score of 2 was selected for our TSS. In an external validation set (50 patients), the probability of HT was high with scores ≥2 (PPV, 93.3%; NPV, 82.9%). We developed a TSS that offers a simple, yet, valuable tool, for diagnosing HT, especially in patients who cannot undergo biopsy.

Although histologic confirmation by biopsy is the gold standard for diagnosing transformation, 27 it is not always possible to obtain the specimen for biopsy (e.g., in cases when disease progression is in an inaccessible location or develops very rapidly). Of note, even in prospective studies, 16,21,28 the specimen for biopsy could not be obtained in 60%-80% of the patients at the time of disease progression. Moreover, a previous study has reported that more than half of the FL patients with transformation were diagnosed based only on clinical criteria and at the physician's discretion without a histologic confirmation. 15 Moreover, limited information is available regarding the clinical factors at the time of disease progression that are associated with the transformation. 17,18,23,24 Although the clinical criteria for transformation had been proposed in the pre-rituximab era, 24 a recent retrospective study indicated that such criteria may not be reliably accepted in the rituximab era. 18 Moreover, to the best of our knowledge, no study has been conducted in the rituximab era to compare and statistically identify the clinical factors associated with disease progression in patients with biopsy-proven FL and HT.
Therefore, the present study conducted a retrospective analysis at two hospitals of the National Cancer Center of Japan to develop a transformation scoring system (TSS) for the diagnosis of the clinical transformation of FL that would be easy to use in both daily practice and clinical trials.

| Study design
This retrospective study utilized derivation and validation patient cohorts to develop definition criteria for FL clinical transformation. Patients initially diagnosed with FL (grades 1, 2, or 3a) according to the World Health Organization's classification 29,30 were included. Patients with grade 3b FL and composite lymphoma (i.e., confirmed to have both FL and diffuse large B-cell lymphoma [DLBCL]) at initial diagnosis were excluded.
To assess the definition of clinical transformation of FL, we retrospectively analyzed patients who were initially diagnosed with FL (grades 1, 2, or 3a) and underwent biopsy at the time of disease progression at the National Cancer Center Hospital (NCCH) between 2000 and 2016. Using this cohort of patients (the derivation cohort), we investigated the clinical characteristics at the time of disease progression and constructed a TSS based on clinical covariates obtained by multivariate logistic regression model.
To validate the TSS, we retrospectively analyzed two cohorts of patients who were initially diagnosed with FL (grades 1, 2, or 3a). First cohort comprised patients who did not undergo biopsy at the time of disease progression and who were diagnosed at the NCCH between 2000 and 2016 (the internal validation cohort). Second cohort comprised patients who underwent biopsy at the time of disease progression and were diagnosed at the NCCH-East (NCCHE) as a completely independent cohort between 2003 and 2014 (the external validation cohort). We applied the TSS to both cohorts.
This study was approved by the Institutional Review Board of the National Cancer Center and was conducted in accordance with the principles of Declaration of Helsinki.

| Definition of transformation
HT was defined based on biopsy confirmation involving both an increase in the number of large cells and a loss of follicular structure. Progression from grade 1 and 2 to grade 3 was not included in HT. Only biopsy-proven transformation from FL to DLBCL was included as HT; transformations from FL to other histological types (Burkitt or Hodgkin lymphoma) were excluded.

| Statistical analyses
Categorical variables were compared using the Fisher's exact test. The probability of overall survival (OS) was calculated using the Kaplan-Meier method, and the groups were compared using the log-rank test. The OS from disease progression was defined as the duration from disease progression to death from any cause or the date of the last follow-up. The cumulative incidence of HT was calculated using the Gray's method. In a competing risk model for HT, death before HT was defined as a competing risk. The time to HT was calculated as the duration between the date of initial diagnosis of FL and the occurrence of HT. Clinical data for each patient were extracted from the patient's medical records. A two-sided p-value <0.05 was considered statistically significant. Variables significantly associated with HT in univariate analysis were included in the multivariate logistic regression model. Clinical stage was determined according to the Ann Arbor classification system. Focal lymph nodal (LN) enlargement was defined when the nodal mass larger than 3 cm was observed in only one nodal area and the size of nodal masses in other nodal areas was less than 3 cm. The nodal area was defined according to the Follicular Lymphoma International Prognostic Index (FLIPI) 31 . Focal LN enlargement was also assessed for larger diameter (the nodal mass ≥7 cm). Bulky disease was defined as the nodal mass ≥6 cm in diameter, regardless of the number of nodal areas. The maximum standardized uptake value (SUVmax) was assessed for patients who received 18 F-fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT). The TSS scores were calculated from a regression coefficient for each statistically significant variable. Receiver operating characteristic (ROC) curve analysis was used to assess the accuracy of the TSS and SUVmax, the cutoff values for which were determined with a high positive predictive value (PPV) and negative predictive value (NPV). Statistical analyses were performed using the EZR software package, version 1.32 (Saitama Medical Center, Jichi Medical University, Saitama, Japan), which is a graphical user interface for R (The R Foundation for Statistical Computing, version 3.2.4). 32

| Development of the transformation scoring system in the derivation set
Patients' selection flowcharts are shown in Figure 1A-B. During the study period in the NCCH cohort, 459 patients were diagnosed with FL (grades 1, 2, or 3a) at the NCCH ( Figure 1A). The median duration of follow-up among these patients was 7.1 (range: 0.2-16.6) years. Disease progression was observed in 184 patients, among whom 80 (43%) had the histologic documentation (FL in 42, HT with DLBCL in 34, and HT other than DLBCL in 4). Finally, we identified 76 patients with biopsy-proven FL or HT with DLBCL as subjects for the derivation analysis. In this cohort, the first-line treatment between FL and HT was similar; 22 patients (28.9%; FL in 11 and HT in 11) were initially managed with WW, 45 patients (59.2%, FL in 24 and HT in 21) were immediately treated with rituximab-containing therapy, and nine patients (11.8%, FL in 7 and HT in 2) were immediately treated with local radiotherapy. Further, both groups had similarly received R-CHOP therapy before disease progression (FL in 20 and HT in 20).
The clinical characteristics of 76 patients with biopsy-proven FL or HT at the time of disease progression are shown in Table 1 Table 1, the calculation of the total number of patients (FL patients + HT patients) in the "Number of relapses from initial diagnosis" has been corrected from "14 (18.4) (Table 2). We constructed the scoring system consisting of the abovementioned four factors; the weights of the variables were decided based on the regression coefficients. To assess the cutoff value that best distinguished HT from FL, we used the ROC curve analysis (Figure 2A). The area under the ROC curve (AUC) was high (0.91, 95% confidence interval [CI]: 0.828-0.981); the cutoff score was determined to be 3.31, which produced a high PPV (96.4%) and NPV (85.4%).
To develop a simple scoring system so as not to change the previous PPV/NPV, we assigned the scores with reference to the regression coefficients and the previous cutoff score ( Table 2). Using the simplified transformation scoring system, TSS, the cutoff value was identified to be 2, which gave the same predictive value; the PPV and NPV were 96.4% and 85.4%, respectively ( Figure 2B). According to the TSS score, the percentage of HT patients with scores of 0, 1, 2, 3, 4, and ≥5 were 12%, 17%, 91%, 100%, 100%, and 100%, respectively (Table 3 and Figure 3).

| External validation set
As shown in Figure 1B Table S1. We applied the TSS to this completely independent cohort for external validation. Based on the ROC curve analysis, the AUC of the TSS in the external validation cohort was 0.900 (95% CI, 0.815-0.987) ( Figure 2C). Furthermore, a score of 2 or higher produced a high PPV and NPV of 93.3% and 82.9%, respectively, for HT diagnosis, which confirmed the validity of the TSS. According to the TSS score, the percentage of patients with HT who had scores of 0, 1, 2, 3, 4, and ≥5 were 5%, 33%, 100%, 80%, 100% and 100%, respectively (Table 3 and Figure 3).
Among 76 patients with biopsy-proven FL or HT in the derivation set, the probability of 5-year OS after disease progression was 96.9% (95% CI, 79.8-99.6) in patients with biopsy-proven FL and 62.2% (95% CI, 34.9-80.8) in patients with HT (p < 0.001; Figure 4A). Further, the probability of 5-year OS after disease progression was lower in patients with higher TSS scores (≥2) than in patients with lower scores (0-1) (58.9% [95% CI, 34.0-77.1] vs. 95.8% [95% CI, 73.9-99.4], p < 0.001; Figure 4B). Furthermore, among the 459 patients with FL in the NCCH cohort, 104 who developed disease progression could not undergo biopsy, including seven who were diagnosed with clinical transformation and treated accordingly. As an internal validation analysis, the TSS score distributions at first disease progression in these 104 patients are shown in Figure S1C and Table S2; 20 patients (19%) had a score of 2 or higher. Interestingly, the probability of 5-year OS after disease progression was lower in patients with higher scores than in patients with lower scores (63.4% [95% CI, 35.8-81.7] vs. 98.2% [95% CI, 88.0-99.7], p < 0.001; Figure 4C). Further, almost all the patients (86%) with higher TSS scores died of lymphoma, as shown in Table S2. Regarding the salvage therapies for the patients with higher scores, there were no statistically significant differences between this cohort and the derivation cohort, except with rituximab monotherapy (Table S3).

| DISCUSSION
We developed and validated a new scoring system for determining the clinical transformation of FL, TSS, using two independent cohorts (the NCCH and NCCHE cohorts). It is difficult to obtain the specimen for biopsy in all patients with FL at the time of disease progression. In fact, the previous studies have indicated a low rate (20.6-42%) of performing biopsy at the time of disease progression of FL. 16,21,28 Therefore, although histologic analysis is the gold standard for diagnosing transformation, our new scoring system will be useful for assessing the probability of transformation in patients who are unable to undergo the biopsy. Diagnosing transformation is important for patients with FL because, despite the availability of rituximab, HT is still strongly associated with mortality in patients with FL. 19,21 Further, treatment strategies for patients with HT could be more intensive than those for patients without HT, and include procedures such as hematopoietic stem cell transplantation. 16,[33][34][35] Moreover, because the incidence of HT is one of the designated clinical trial endpoints of FL, reliable

F I G U R E 3 Probability of histologic transformation according
to the transformation scoring system in the derivation and external validation sets Abbreviation: HT, histologic transformation diagnosis of transformation is essential for assessing this endpoint accurately. However, there are many patients with FL who could not undergo a biopsy at the time of disease progression, even in clinical trials, 15,16,20,23,24,26 which resulted in varying rates of transformation reported among such trials. Patients without biopsies are currently diagnosed with clinical transformation solely based on their clinical characteristics; however, because of the lack of standardized criteria for diagnosing the clinical transformation of FL, it has been difficult to compare the incidence rate of HT among the previously published studies. Several studies have compared the clinical factors of FL and HT at the time of the initial diagnosis of FL to predict the risk of HT, [15][16][17][18][20][21][22][23][24][25][26] including prospective cohort studies with a large number of patients. 16,26 However, among available studies, the number of HT patients diagnosed by biopsy were limited, [16][17][18]21,23,25 which resulted in varying HT risk factors and incidence rates being reported. On the contrary, although there have been a few studies that assessed the clinical factors at the time of disease progression in patients with HT, 17,18,23,24 detailed comparisons between FL and HT have not been performed in the era of rituximab availability. Therefore, we elucidated the clinical factors associated with HT at the time of disease progression in the immunochemotherapy era.
A well-known criterion for clinical transformation has been derived from the Vancouver population-based analysis in the pre-rituximab era, 24 wherein the clinical transformation was arbitrarily defined as exhibiting one or more of the five clinical manifestations including rapid nodal growth, extranodal sites, new B symptoms, LDH over twice the ULN, and new hypercalcemia. The reliability of this criterion was demonstrated by the close similarity in the clinical outcomes of patients diagnosed with clinical transformation using the criterion, to those diagnosed by biopsy. However, cohorts of patients with HT may be different in the pre-and post-rituximab eras, since the comparisons of these two periods have shown that the clinical outcomes of patients with HT were worse and the incidence of HT was higher in the pre-rituximab era. 15,18,21,26 In addition, these five clinical factors were not verified using statistical models, although the impact of each of these factors on patients with HT is likely to be different. Thus, currently, there are no standardized criteria for diagnosing the clinical transformation of FL. In our study, we extracted the detailed clinical factors at the time of disease progression only from patients with biopsy-proven histology and performed statistical analyses, including the validation analysis, on these factors. This was in an attempt to standardize the definition of clinical transformation in the rituximab era. Furthermore, as one of the factors, "rapid nodal growth," comprising the Vancouver criterion was not rigorously defined, it may be difficult to accurately use this criterion in

F I G U R E 4 Kaplan-Meier curves
showing overall survival according to histology and the transformation scoring system Probability of overall survival after disease progression in patients with FL vs. HT in the derivation set (A), overall survival after disease progression in patients with high vs. low TSS scores in the derivation set (B), and overall survival after disease progression in 104 patients who did not undergo biopsy at the first progression with high vs. low TSS scores in the internal validation set (C). Abbreviations: FL, follicular lymphoma; HT, histologic transformation; TSS, transformation scoring system both daily practice and clinical trials. In contrast, the TSS, comprising of "focal LN enlargement," was strictly defined and may indicate that only one nodal area progressed more rapidly than other nodal areas in patients with HT, which might better describe "rapid nodal growth." Another possibility is that the persistence of one enlarged LN in FL patients may be associated with the development of HT. Therefore, the TSS can be evaluated quantitatively at a single time, thereby providing easy access to FL transformation in both daily practice and clinical trials.
As an internal validation analysis, we applied the TSS to 104 patients who did not undergo biopsy at the time of disease progression. Among them, the majority of patients (81%) had lower scores, according to the TSS. Importantly, the prognosis of the patients with higher scores (n = 20) was similar to that of patients with HT, although the salvage therapies among both cohorts were not the same. This might indicate that the TSS can be used to stratify FL patients with disease progression who did not undergo biopsy, and can diagnose them with the clinical transformation. Moreover, our new scoring system may be used as a prognostic index at the time of FL disease progression because among the 180 patients with disease progression in the NCCH cohort, the prognoses of patients with higher scores were poorer than that in the patients with lower scores, as shown in Figure 4B,C.
We also assessed the SUVmax value to distinguish HT patients from FL patients who underwent FDG-PET/CT (Supplementary material). In the both derivation and validation cohorts, a high SUVmax value indicated that patients with FL had developed HT, which was consistent with the previous studies. [36][37][38][39] Although we tried to incorporate the SUVmax value into the TSS, a superior model could not be developed. Even in the cohort of patients who received PET/CT, the TSS was superior to the scoring system, which incorporated the SUVmax value in the derivation set (data not shown). Furthermore, the scoring system with the incorporated SUVmax value was not validated well because the SUVmax value in patients with HT in the external validation cohort was higher than that in the derivation cohort despite using similar PET/CT scanner, protocol, and software in the hospitals. Theoretically, the SUVmax value would vary among institutions because of the difference in PET/ CT scanner and the method of SUV quantification. Thus, it is difficult to apply a certain SUVmax value to other institutions. In addition, because a recent study suggested that a high SUVmax value of the patients with FL at initial diagnosis was not associated with HT, 40 an increase in SUVmax value at the time of disease progression compared to that at initial diagnosis may be important to assess HT. Owing to the aforementioned reasons, we did not incorporate the SUVmax value in the TSS.
This study has several limitations. First, due to the retrospective nature of this study, we analyzed limited number of patients who underwent a biopsy at the time of disease progression. This might have resulted in potential bias in developing the TSS, even though we validated it in a completely independent cohort. Second, the decision to perform a biopsy was at the physician's discretion; however, the TSS was also validated in patients who did not undergo biopsy at the first progression. Third, as the TSS was developed to assess HT at the time of disease progression, it cannot predict HT in patients who were initially diagnosed with FL. Therefore, to confirm the TSS, prospective studies comprising a large number of patients may be warranted.
In conclusion, we developed a new scoring system for the clinical transformation of FL, TSS, and validated it in an independent cohort. The TSS promises to be a simple, yet, valuable tool, for the diagnosis of clinical transformation in both daily practice and clinical trials, especially in patients for whom obtaining a biopsy specimen is not feasible.

ACKNOWLEDGMENTS
This work was supported in part by a grant from the Japan Agency for Medical Research and Development (AMED) under Grant Numbers 17ck0106349h0001 and 18ck0106349h0002. The authors would like to thank the medical, nursing, data processing, laboratory, and clinical staff members at the participating centers for their important contributions to this study and their dedicated patient care.