Radiomic prediction for durable response to high‐dose methotrexate‐based chemotherapy in primary central nervous system lymphoma

Abstract Background The rarity of primary central nervous system lymphoma (PCNSL) and treatment heterogeneity contributes to a lack of prognostic models for evaluating posttreatment remission. This study aimed to develop and validate radiomic‐based models to predict the durable response (DR) to high‐dose methotrexate (HD‐MTX)‐based chemotherapy in PCNSL patients. Methods A total of 159 patients pathologically diagnosed with PCNSL between 2011 and 2021 across two institutions were enrolled. According to the NCCN guidelines, the DR was defined as the remission lasting ≥1 year after receiving HD‐MTX‐based chemotherapy. For each patient, a total of 1218 radiomic features were extracted from prebiopsy T1 contrast‐enhanced MR images. Multiple machine‐learning algorithms were utilized for feature selection and classification to build a radiomic signature. The radiomic‐clinical integrated models were developed using the random forest method. Model performance was externally validated to verify its clinical utility. Results A total of 105 PCNSL patients were enrolled after excluding 54 cases with ineligibility. The training and validation cohorts comprised 76 and 29 individuals, respectively. Among them, 65 patients achieved DR. The radiomic signature, consisting of 8 selected features, demonstrated strong predictive performance, with area under the curves of 0.994 in training cohort and 0.913 in validation cohort. This signature was independently associated with the DR in both cohorts. Both the radiomic signature and integrated models significantly outperformed the clinical models in two cohorts. Decision curve analysis underscored the clinical utility of the established models. Conclusions This radiomic signature and integrated models have the potential to accurately predict the DR to HD‐MTX‐based chemotherapy in PCNSL patients, providing valuable therapeutic insights.


| INTRODUCTION
Primary central nervous system lymphoma (PCNSL) is a rare extranodal non-Hodgkin lymphoma that only involves the brain, spine, cerebrospinal fluid, and eyes without evidence of systematic spread. 1 Over the past decades, high-dose methotrexate (HD-MTX)-based polychemotherapy regimens have been widely adopted as the standard care for PCNSL patients. 24][5][6] Notably, only a quarter of the patients had a remission lasting over 3 years after receiving HD-MTX treatment. 4,7This observed discrepancy may derive from individual heterogeneity in the physiological response to MTX, compounded by a variety of adjuvant medications used across different institutions and regions. 2 Nevertheless, the use of MTX as the backbone in PCNSL treatment remains undisputed. 2,8ational Comprehensive Cancer Network (NCCN) guidelines indicate that the absence of response to treatment is declared if a patient receiving HD-MTX-based chemotherapy without radiotherapy experienced remission less than 1 year. 9Evidence has shown that patients who completed chemotherapy but experienced tumor progression within 1 year had a dismal prognosis, with a median survival of only 2 months. 2In addition, administration of HD-MTX has proven to have significant side effects, which may be challenging for elderly patients with reduced physiological reserves. 4,10Therefore, early identification of the special population who receive HD-MTXbased chemotherapy with durable remission/response (≥1 year) is crucial.It will facilitate the development of individual therapeutic strategies and improve tumor remission rates.
2][13] While radiomics has been widely employed in differentiating between PCNSL and glioblastoma, 14,15 the correlation between radiomic features and therapy response in PCNSL remains relatively unclear. 15Several studies have reported the prognostic potential of radiomic signatures in predicting outcomes in PCNSL patients, although validation from independent external groups has been lacking. 16,17A recent study by Destito et al. employed radiomics-based machining learning tools to predict overall and progression-free survival in PCNSL. 18However, the selected radiomic features may not yet have a well-established value for individual prognosis due to the complexities and inconsistencies in treatment details.Furthermore, given the absence of consensus regarding systematic chemotherapy regimens for PCNSL, current clinical models such as the International Extranodal Lymphoma Study Group (IELSG) score 19 and Memorial Sloan-Kettering Cancer Center (MSKCC) classification 20 may not have optimal predictive performance for prognosis.Therefore, the imperative now lies in prioritizing the development of enhanced biologic and radiologic predictive models for PCNSL, which are crucial to propel therapeutic advancements.
The primary objectives of this study are as follows: (i) utilize machine learning-based algorithms to identify the pivotal radiomic characteristics that could predict a durable response (DR) to HD-MTX-based chemotherapy (remission lasting more than 1 year after treatment) in PCNSL patients; (ii) develop integrated models that combined both radiomics and clinical features, aiming to improve the predictive capacity for DR to HD-MTX-based treatment in PCNSL patients; and (iii) conduct an external validation to assess the reliability and robustness of our findings using data from another institution.

| Study protocol approvals and patient consents
This retrospective study was approved by the Institutional Review Board of our institution (YW2019-016-11).Written informed consent for patient was waived.

| Patient assessment and variable collection
This study retrospectively enrolled patients pathologically diagnosed with PCNSL between 2011 and 2021 based on the institutional databases of Beijing Tiantan Hospital (Centre 1 as the training cohort) (n = 112) and the Sixth Medical Center of PLA General Hospital (Centre 2 as the external validation cohort) (n = 47) (Figure 1).Biopsy-proven PCNSL patients who were immunocompetent, 18 years of age or older, had normal liver and renal functions before chemotherapy, received neither whole brain radiotherapy (WBRT) nor autologous stem cell transplant (ASCT), and completed T1 contrast-enhancement (T1CE) MR imaging with intact follow-up information were eligible for enrollment in this study.Patients who had craniotomy due to PCNSL itself or other CNS diseases, administration of corticosteroids and/or Bruton tyrosine kinase (BTK) inhibitor therapy during the initial treatment period, evidence of extra-CNS involvement, a low KPS score (≤50) unrelated to PCNSL, poor brain image quality, active infection, concomitant malignancy, pregnancy, or breastfeeding were excluded.Baseline clinical data included age, gender, Karnofsky performance scale (KPS) score, Eastern Cooperative Oncology Group-Performance Status (ECOG-PS), IELSG scores, MSKCC scores, tumor location, and treatment strategies.

| Treatment protocol
The initial treatment consisted of an induction phase and a consolidation phase.During the induction period, each patient received 4-6 cycles of HD-MTX (3 g/m 2 ) intravenously on Day 1 of a 14-day cycle.Among all patients, 61 individuals were administered HD-MTX alone, and the remaining received HD-MTX plus adjuvant therapy (details in Table 1).As recommended by NCCN guidelines, the chemotherapy regimens employed during the induction stage were to be maintained without modification and continued into the consolidation phase if tolerated.Consequently, during the consolidation period, more than half of the patients continued with the original HD-MTX-based chemotherapy regimens per month for up to 1 year, while approximately one-third of the patients were switched to alternative chemotherapy regimens due to intolerance and toxicity associated with HD-MTX F I G U R E 1 Flow diagrams of patient selection and radiomics construction.ASCT, autologous stem cell transplant; DR, durable response; LoR, loss of response; ROC, receiver operating characteristic; WBRT, whole brain radiotherapy.administration during the treatment period (Table 1).Supportive therapy included intravenous administration of folinic acid every 6 h following 24 h of HD-MTX administration, and granulocyte colony-stimulating factor shots if needed.Complete response (CR) was defined as per the International PCNSL Collaborative Group (IPCG) criteria. 21All chemotherapy regimens were completed for all patients regardless of whether the CR was achieved earlier.No grade 4 toxic effects were reported during the period of therapy according to the WHO's 1996 classification.

| Follow-up and endpoint
Patients treated in two institutions were followed up at an interval of 3 months.Progression-free survival (PFS) was defined as the interval from the date of diagnosis to the date of recurrence or the last follow-up.According to the NCCN guidelines, the endpoint was defined as remission lasting ≥1 year after HD-MTX-based chemotherapy, namely DR to HD-MTX-based chemotherapy.Specifically, DR to HD-MTX-based chemotherapy was determined by either of two criteria: (i) patients, who achieved CR after initial treatment, remained in remission for 1 year or longer without signs of new lesion appearance on T1CE MR images; (ii) patients, who did not achieve CR after initial treatment, experienced radiological maintenance or continuous reduction in the volume of targeted lesions for over 1 year without signs of new lesion appearance on T1CE MR images.However, the patients who completed initial treatment but lacked durable remission (more than 1 year) were considered to have loss of durable response (LoDR) to treatment.

| Image acquisition and preprocessing
As recommended by IPCG, 21 all patients underwent T1CE MR imaging with DTPA-Gd injections before biopsy at two centers.The detailed parameters of MR image acquisition are described in Table S1.All T1CE scans were retrieved from the Picture Archiving and Communication System for further image processing.Prebiopsy MRIs were analyzed by an experienced neurosurgeon (M.L with 8 years of clinical experience).All segmentation masks were confirmed by a senior radiologist (L.L.Y) and a senior neurosurgeon (X.H.R) simultaneously, both of whom had more than 15 years of clinical experience.Disagreements were resolved through consensus-based discussion.Regions of interest (ROIs, consisting of enhanced tumor plus necrosis) were manually outlined using the T1CE images via the itk-SNAP software (www.itksn ap.org) (Figure 1).To reduce the potential bias of results and overfitting, a coarse-to-fine approach was utilized to optimize the feature selection process. 23Initially, an analysis of variance (ANOVA) was used to address the impact of scanner effects, which are nonbiological variations produced by image acquisition settings.Next, univariate analysis was performed using the Mann-Whitney U and Pearson correlation coefficient to compare the differences between the DR group and the LoR group for each radiomic feature.According to the ascending order of the p-value in each analysis, the repeated features of the top 5% in both analyses were selected for further analysis.Afterward, five classifiers were employed to select relevant features, including recursive feature elimination based on a support vector machine, least absolute shrinkage and selection operator, extremely randomized trees, random forest (RF), and ridge regression.The optimal radiomic features were ultimately identified based on their repeated appearance among all five methods (Table S2).This selection process was initially carried out on the training set and then validated on the validation set.

| Clinical use
Decision curve analysis (DCA) was plotted to evaluate the clinical net benefits of clinical models, radiomic signatures, and clinical-radiomic integrated models at different threshold probabilities. 24Furthermore, the net reclassification index (NRI) and integrated discrimination improvement (IDI) were calculated to compare the clinical usefulness among the established models.

| Statistical analysis
Categorical variables are expressed as percentages, while continuous variables are presented as medians with interquartile ranges (IQRs).Clinical data were compared between the DR and LoDR groups, as well as between the training and validation cohorts using the Mann-Whitney U test, chi-square test, or Fisher's exact test, as appropriate.Generalized multivariable logistic analysis was conducted to identify durable response-related favorable factors.Multivariable Cox regression model was employed to further investigated the correlation between radiomic signature and PFS.All statistical computations were performed using IBM SPSS v26.0 and Python v3.1.0.A p < 0.05 for two-tailed tests was considered statistically significant.

| Patient characteristics
As shown in Figure 1, a total of 159 PCNSL patients who received HD-MTX-based chemotherapy were initially included in the study.Of these, 112 patients were assigned to the training cohort, and 47 patients were assigned to the external validation cohort.Based on the inclusion and exclusion criteria, 54 patients were deemed ineligible and subsequently excluded from the analysis for the following reasons: 11 patients lacked follow-up information, 14 patients had poor image quality, 19 patients underwent WBRT or ASCT, and 5 patients had surgical resections.
After careful screening, a total of 105 PCNSL patients were eligibly enrolled in this analysis (n [training cohort] = 76; n [validation cohort] = 29).The median follow-up time of this cohort was 16 months with an IQR of 6-34 months.Among all eligible individuals, 65 patients had DR to HD-MTX-based chemotherapy.The baseline features of the patients are shown in Table 1 and Table S3.There were balanced baseline clinical parameters and no significant DR rate between the training cohort and validation cohort (61.9% for the training cohort vs. 60.5% for the validation cohort, p = 0.638).In the training cohort, several clinical characteristics were significantly different between the DR and LoDR groups, including age, KPS score, ECOG score, IELSG score, MSKCC classification, and treatment strategies.However, in the validation cohort, no significant differences were observed in clinical variables between the two groups.

| Feature selection and radiomic signature construction and validation
According to the coarse-to-fine feature selection strategy, eight imaging features were finally selected from the T1CE sequence (Table S4).Based on these selected features, a radiomic signature was constructed using the RF model in the training cohort.
There was a significant difference in the Radscore between the DR and LoDR groups in both cohorts (Table S3).In the training cohort, the radiomic signature yielded an impressive AUC of 0.994 (95% CI, 0.981-1.000),with an accuracy of 0.974, sensitivity of 0.967, specificity of 0.978, PPV of 0.967, and NPV of 0.978, for differentiating the DR and LoDR groups.In the validation cohort, the results were similar to those of the training cohort, with an AUC of 0.913, sensitivity of 0.900, and NPV of 0.933.However, it is important to note that the model performance of the validation cohort exhibited a decrease in accuracy (0.793), specificity (0.737), and PPV (0.643) when compared to those of the training cohort (Figure 2 and Table 3).

| Association of radiomic signature with DR and PFS
To minimize confounding and prediction bias, a generalized multivariable logistic model was employed to investigate the independent association between the radiomic F I G U R E 2 Predictive performance of the established models for differentiating response to high-dose methotrexate-based chemotherapy in the training and validation cohorts.For confusion matrices, the color depends on the number inside the square: The higher the number is, the darker the color.For receiver operating characteristic curves, the area under the curves were shown and compared among different models.AUC, area under the curve; IELSG, International Extranodal Lymphoma Study Group; MSKCC, Memorial Sloan-Kettering Cancer Center.
signature and DR.The analysis unveiled the independent predictive potential of the radiomic signature for distinguishing between the DR and LoDR groups.(beta, 1.210 SE, 0.216; p < 0.001) (Table 2).Consistently, this predictive capacity was observed after adjusting for confounders in the validation cohort (beta, 1.951 SE, 0.375; p < 0.001) (Table 2).We further investigated the impact of the radiomic signature on PFS.Multivariable Cox regression demonstrated that higher Radscore values were significantly associated with a decreased risk of relapse in both the training cohort (HR 0.005, 95% CI: 0.001-0.030,p < 0.001) and validation cohort (HR 8.3e −5 , 95% CI: 6.1e −7 −0.011, p < 0.001).Additionally, older age was found to contribute to an increased risk of recurrence (HR 1.060, 95% CI: 1.001-1.125,p = 0.045) in the training cohort.However, this significant correlation was not observed in the validation cohort (Table 3).

| Development and validation of the clinical-radiomic integrated models
Based on the current clinical prognostic models, two integrated models, namely the Radscore-IELSG and Radscore-MSKCC models, were constructed.In the training cohort, these two integrated models achieved remarkable performance for differentiating the DR and LoDR groups, with AUCs of 0.998 (95% CI, 0.992-1.000) in the Radscore-IELSG model and 0.998 (95% CI, 0.998-1.000) in the Radscore-MSKCC model.For the Radscore-ILESG model, confusion matrix analysis revealed outstanding accuracy, sensitivity, specificity, PPV, and NPV-0.987,1.000, 0.978, 0.968, and 1.000, respectively.Similarly, the MSKCC-Radscore model exhibited high accuracy, sensitivity, specificity, PPV, and NPV values of 0.974, 0.967, 0.967 and 0.978, respectively, facilitating discrimination between the DR and LoDR groups (Figure 2 and Table 4).In the validation cohort, these two integrated models maintained strong predictive performance, with an AUC of 0.934 for both the Radscore-ILESG and MSKCC-Radscore models.However, it is worth noting that the values of accuracy, specificity and PPV were reduced in the validation cohort compared to those in the training cohort (Figure 2 and Table 4).

| Comparison of performance among established models
As illustrated in Figure 2, the predictive performance of the radiomic signature was superior to both the ILESG model (AUC: 0.994 vs. 0.628, p < 0.001) and the MSKCC model (AUC: 0.994 vs. 0.699, p < 0.001).Both clinical-radiomic integrated models also significantly outperformed these two clinical models.However, the predictive ability was not significantly improved in either the Radscore-ILESG model (AUC: 0.998 vs. 0.994, p = 0.193) or Radscore-MSKCC model (AUC: 0.998 vs. 0.994, p = 0.179) compared to the radiomic signature.Notably, there was no significant difference in predictive performance between the two integrated models (p > 0.999).Similar comparative results were also found in the validation cohort.

| Clinical use
DCA showed that using the aforementioned established models to predict MTX-treated DR added more benefit than either the treat-all or treat-none schemes in both the training and validation cohorts (Figure 3).Notably, as observed visually, the utilization of the Radscore-ILESG model in clinical decision-making may yield greater advantages compared to other established models in both training and validation cohorts (Figure 3).Furthermore, the NRI and IDI values consistently indicated that the radiomic signature exhibited enhanced predictive power in contrast to the two clinical models.Additionally, both integrated models might be particularly beneficial for clinical use when compared to the radiomic signature (IDI for the Radscore-IELSG model: 0.08, IDI for the Radscore-MSKCC model: 0.09; all p < 0.001) in the training cohort.However, there were no significant differences in the NRI and IDI values between the radiomic signature and the two integrated models in the validation cohorts (Table 5).

| Webpage development tool
Based on the feasibility of the Radscore-ILESG model, we further developed a simple-to-easy web tool for clinical practice.This application allows users to input Radscore and ILESG values through the interface, after which it calculates the predicted risk of DR to HD-MTX-based chemotherapy.The web application is accessible online at https:// radsc ore-ielsg -model.shiny apps.io/ App-1/ .

| DISCUSSION
In this multicenter cohort study, we investigated the potential predictive capability of prebiopsy T1CE MR imagebased radiomic signature to identify PCNSL patients who were treated solely with HD-MTX-based chemotherapy and achieved remission for over 1 year.The proposed radiomic model demonstrated strong performance during external validation.Additionally, we innovatively developed clinical-radiomic integrated models that exhibited exceptional performance in both the training and validation cohorts.These models show promise as valuable tools for guiding therapeutic decisions at the individual level.Accurate and individual prediction of DR to MTXbased chemotherapy is crucial for managing PCNSL patients.However, due to the rarity of this tumor, few studies have focused on this aspect.To the best of our knowledge, this study is the first to develop radiomic-based models for predicting durable remission in PCNSL patients who solely received HD-MTX-based chemotherapy.Given that the extent of therapeutic response evaluated by imaging dose not seems to reflect survival after the end of treatment for PCNSL patients, 25 our study adopted DR as an endpoint that involves information on both treatment response status and PFS parameter (i.e., posttreatment remission lasting more than 1 year).By using DR, we aim to enhance predictive accuracy in clinical practice.Furthermore, defining DR aligns with scientific rigor by referencing NCCN guidelines, especially given the absence of a consensus on optimal HD-MTX-based chemotherapy regimens.7][28][29] Furthermore, unlike previous studies, [16][17][18] our models were externally validated using data from another institution, enhancing the reliability of our findings.7][18][19][20] In order to mitigate the predictive bias resulting from additional medications use, our study included more than half of patients administrated HD-MTX alone during the therapeutic period.It is important to note that this does not imply the additional drugs lack efficacy in the treatment response of PCNSL.Additionally, we observed a significant positive correlation between older age and recurrence risk in the training cohort, but this relationship did not reach statistical significance in the validation cohort.A possible explanation for this discrepancy is the relatively small sample size of the validation cohort, making it challenging to establish a stable relationship between age and PFS.Furthermore, elderly patients often exhibited inferior long-term tolerance to HD-MTX treatment.Given the diverse design of HD-MTX-based chemotherapy regimens in real clinical practice, it is essential to standardize treatment protocol for this vulnerable population in the future.
To maximize population homogeneity and enhance prediction accuracy, potential confounders significantly associated with defined endpoint have been excluded, such as acceptance of WBRT or ASCT and administration of BTK inhibitors. 30Additionally, prebiopsy data were used to analyze MR scans to mitigate postbiopsy bleedingrelated inaccuracies in ROI segmentation.The use of the T1CE sequence for radiomic parameter extraction allowed for the identification of small foci and ensured the stability of manual segmentation in clinical practice, particularly in cases with multiple lesions.
Another contribution of our work is the innovative development of parametric models combining clinical and radiomic signatures for predicting DR to HD-MTX-based chemotherapy in PCNSL patients.These integrated models also performed well upon external validation, surpassing the clinical models.This result was expected, given that the two clinical models were initially designed to predict PFS for PCNSL patients, irrespective of the homogeneity of the treatment approach. 19,20Therefore, our radiomic signature proves particularly adept at offering precise predictive values for a specific subset of PCNSL patients who have exclusively undergone HD-MTX-based chemotherapy.While neither the Radioscore-IELSG model nor the Radioscore-MSKCC model significantly outperformed the radiomic signature, DCA showed that the clinical utilization of the Radioscore-IELSG model was more likely to be stable than that of the other models in both the training and validation cohorts.This suggests that some MR image features may be inherently linked to clinical factors, contributing minimally to the improvement of integrated models' performance.Overall, the results at least indicated the practicality and feasibility of radiomic-clinical integrated models.We further developed a user-friendly online tool based on the Radscore-ILESG model that have a potential to be translated in clinical practice if adequately validated.Our established online application may offer valuable insights, particularly in identifying patients who are likely to benefit from HD-MTX-based chemotherapy.If PCNSL patients are predicted to have high risk of DR to HD-MTX-based chemotherapy, considering the administration of such a regimen is advisable.Conversely, for patients predicted to have a low risk of DR, early switching to alternative therapeutic strategies, such as chemotherapy-free options, 9 is recommended.Overall, this tool facilitates early decision-making tailored to individual therapeutic strategies.
There were several limitations in this study.First, potential bias in sample selection and image data acquisition may have occurred due to the retrospective study design.Second, despite the employment of multiple statistical methods to control for confounders, the diversity of HD-MTX-based chemotherapy regimens inevitably introduces potential bias in outcome prediction.Furthermore, the heterogeneity of treatment regimens may impact the distribution of patients within the two groups and thus caution in the use of DR/LoDR as the endpoint index should be warranted for this analysis.Therefore, achieving consensus on the standard treatment of PCNSL is crucial to reduce predictive model development error in the future.Third, given the rarity PCNSL, the total number of patients included in our study appears relatively large.However, potential imbalance and biases exist due to the relatively small sample size in the validation cohort (n = 29).For example, baseline parameters between the training cohort and validation cohort might have significantly differed if a larger number of patients were enrolled.Moreover, because of the small sample size in the validation cohort, our results may be inconclusive despite adjusting for chemotherapy regimens and other baseline characteristics through multivariable analysis.Forth, this study was conducted in a single region, and the generalizability of the prediction models to other institutions or countries has not been verified.Fifth, although the radiomic signature from the single T1CE sequence provided strong prediction performance, our model may be further improved by integrating multimodality imaging data in the future.Sixth, a fully automated pipeline for tumor segmentation should be applied for PCNSL to reduce the subjective bias from manual processing, which facilitates translation in clinical practice. 31Finally, despite the inclusion of a relatively large sample size with independent training and validation cohorts, prospective design and deep-learning algorithms might be helpful for improving the prediction performance of the model in the future.

| CONCLUSION
In conclusion, we developed and validated a radiomic signature utilizing prebiopsy MR images, with favorable performance in predicting the durable response of PCNSL to HD-MTX-based chemotherapy.Our radiomic-clinical integrated model shows promise for clinical practice, offering potential valuable insights to optimize treatment strategies and guide clinical decision-making.

Clinical characteristics Overall Cohort (n = 105) Training cohort (n = 76) Validation cohort (n = 29) p-value
For each patient, a total of 1218 imaging features were automatically extracted, covering 18 first-order statistical features, 14 shape-based features, 68 texture features, and 1118 wavelet features.The first-order statistical features, shape-based features, and texture features were extracted from the original image.Texture features included 22 gray level cooccurrence matrix, 16 gray level run length matrix, 16 gray level size zone matrix, and 14 gray level dependence matrix features.The wavelet features were extracted from the images and filtered by the wavelet transform and Laplacian of Gaussian.Features were normalized with zscores after extraction.The ComBat feature harmonization method was utilized to correct for the batch effect introduced by different acquisition protocol scanners.
2.6 | Radiomic signature analysis 2.6.1 | Radiomic feature extraction Each patient's MRI was preprocessed with N4 bias field correction.Each scan was normalized with z-scores to obtain a standard normal distribution of imaging intensities and resampled to the same resolution of 1 × 1 × 1mm 3 voxels.Radiomic feature extraction was conducted using PyRadiomics (v3.1.0;PythonSoftwareFoundation).222.6.2 | Feature selection Associations between covariables and durable response to treatment by generalized multivariable logistic regression.
Performance of the established models.
T A B L E 4Abbreviations: AUC, area under the curve; CI, confidential interval; IELSG, International Extranodal Lymphoma Study Group; MSKCC, Memorial Sloan-Kettering Cancer Center; NPV, negative predictive value; PPV, positive predictive value.
Comparison of clinical use among the established models.
T A B L E 5