Cluster analysis of autoencoder-extracted FDG PET/CT features identifies multiple myeloma patients with poor prognosis

F-18 fluorodeoxyglucose positron emission tomography/computed tomography (FDG PET/CT) is a robust imaging modality used for staging multiple myeloma (MM) and assessing treatment responses. Herein, we extracted features from the FDG PET/CT images of MM patients using an artificial intelligence autoencoder algorithm that constructs a compressed representation of input data. We then evaluated the prognostic value of the image-feature clusters thus extracted. Conventional image parameters including metabolic tumor volume (MTV) were measured on volumes-of-interests (VOIs) covering only the bones. Features were extracted with the autoencoder algorithm on bone-covering VOIs. Supervised and unsupervised clustering were performed on image features. Survival analyses for progression-free survival (PFS) were performed for conventional parameters and clusters. In result, supervised and unsupervised clustering of the image features grouped the subjects into three clusters (A, B, and C). In multivariable Cox regression analysis, unsupervised cluster C, supervised cluster C, and high MTV were significant independent predictors of worse PFS. Supervised and unsupervised cluster analyses of image features extracted from FDG PET/CT scans of MM patients by an autoencoder allowed significant and independent prediction of worse PFS. Therefore, artificial intelligence algorithm–based cluster analyses of FDG PET/CT images could be useful for MM risk stratification.

www.nature.com/scientificreports/ data 8 . A previous study reported an autoencoder that offered a fair diagnostic performance for lung nodule classification 9 . Another study showed the usefulness of an autoencoder for predicting metabolic changes in FDG PET of the brain 10 . However, the ability of an autoencoder to analyze FDG PET/CT images of MM patients has not been previously explored.
In this work, we investigated the prognostic value of image features extracted from FDG PET/CT scans of MM patients by a convolutional autoencoder. Supervised and unsupervised clustering were tested on the extracted PET/CT features. The prognostic value of the extracted feature clusters was then compared with that of MTV measurements.

Methods
Subjects and clinical data. The study candidates were 254 MM patients who underwent FDG PET/CT as an initial workup at our institution between January 2010 and December 2021. Among the candidates, 26 patients were excluded due to follow-up loss without treatment completion. Fourteen patients who received only surgery or local radiotherapy were also excluded. Six patients were excluded due to the absence of cytogenetic studies or serum lactate dehydrogenase (LDH) measurements. An additional 17 cases were excluded due to misregistration between the PET and CT or imaging in different postures (arm elevation). Therefore, 191 patients were included in this study (Fig. 1). Samsung Medical Center institutional review board approved this retrospective study (IRB #2022-10-106). Written informed consent was waived by Samsung Medical Center institutional review board due to the retrospective design of this study. All methods were carried out in accordance with relevant guidelines and regulations.
All clinical information and laboratory results were acquired from electronic medical records. As clinical information, we collected sex, age, date of diagnosis, chemotherapeutic regimen, ASCT history, date of relapse or progression, and date of death or last follow-up. We also collected the following laboratory results: serum albumin, serum β2 microglobulin, serum LDH, serum calcium, serum creatinine, hemoglobin, and cytogenetic study results. The Revised Multiple Myeloma International Staging System (R-ISS) stage of each patient was determined using the laboratory test results 11 . Figure 1. Scheme for selecting study subjects. Candidates retrospectively enrolled were 254 multiple myeloma patients who underwent FDG PET/CT for initial staging. Among them, 26 cases were excluded due to follow-up loss without completion of treatment, and 14 were excluded because they received only local treatment. An additional 6 cases were excluded from analysis because information on the R-ISS stage was unavailable, and 17 cases were excluded due to technical problems. Therefore, 191 patients were included as study subjects for analysis. R-ISS revised multiple myeloma international staging system. FDG PET/CT image analysis. The image analysis scheme is illustrated in Supplementary Fig. 1. FDG PET/ CT images were retrieved to MIM Encore version 7.0 (MIM Software Inc., Cleveland, OH). On the CT images, a volume-of-interest (VOI) with Hounsfield units (HU) above 150 was drawn from the skull base to the upper thigh. This process resulted in inclusion of high-attenuation structures such as calcification or foreign materials and exclusion of some osteolytic lesions with HU below 150. Therefore, an experienced nuclear medicine physician carefully reviewed and manually corrected all the VOIs to include the entire skeleton and exclude non-skeletal high-attenuation structures. The VOIs and PET images were saved in DICOM format. From the VOIs, conventional maximum SUV (SUVmax) and mean SUV (SUVmean) values were obtained. The volumetric metabolic parameters of MTV and TLG were measured using an SUV of 2.5 as the threshold.
Feature extraction. The VOIs and PET images were loaded into Python version 3.9 via the pydicom and rt_utils libraries. Voxels covering only bones were extracted using a previously drawn VOI as a mask. Anterior view maximum intensity projection (MIP) images that were uniformly resized into 64-pixel widths and 128pixel heights were then constructed and used as input data for the convolutional autoencoder that performed the feature extraction. The encoder part of the convolutional autoencoder was composed of 3 convolutional layers and 3 max-pooling layers. The sizes of all the convolutional filters were set to 3 × 3, and the filter numbers were set to 16, 20, and 24. Max-pooling was applied on each 2 × 2 region. The decoder part was composed of 4 convolutional layers and 3 up-sampling layers. The sizes of all the convolutional filters were set to 3 × 3, and the filter numbers set to 24, 20, 16, and 1. Up-sampling was applied on each 2 × 2 region. The ReLu function was used as the activation function in all the encoder layers and the first three decoder layers. The Sigmoid function was used as the activation function in the last decoder layer. The input data were divided into training and test sets with a 1:1 ratio. An Adam optimizer was adopted, and the learning rate was 0.01. The convolutional autoencoder was trained with four batch sizes and 100 epochs. The trained convolutional autoencoder extracted 3072 features from each MIP image.
Those 3072 extracted features were then subjected to supervised and unsupervised clustering. For the unsupervised clustering, K-means clustering was performed in Python version 3.9. The number of clusters was set at three, and the maximum iteration was set at 300. For the supervised clustering, the survClust package in R software was used 12 . It provides an outcome-weighted clustering algorithm for survival stratification of multidimension data. To avoid overfitting, the supervised clustering was performed in 10 rounds of threefold cross validation to classify the subjects into three clusters via the cv.survclust function. A principal component analysis (PCA) was performed to visualize the clustering results using the PCA function in the factoextra package for R software.

Statistical methods and survival analysis.
Shapiro-Wilk tests were performed to evaluate the normality of the conventional FDG parameters: SUVmax, SUVmean, MTV, and TLG. Kruskal-Wallis tests were used to compare those parameters among the different cluster groups. Two-sided P values < 0.05 were deemed statistically significant. All statistical analyses were performed using R software (v. 4.0.4, https:// www.R-proje ct. org/, R Foundation for Statistical Computing, Vienna, Austria) 13 .
PFS was the endpoint of our survival analysis. Events were MM relapse, progression, or any-cause death. Recurrence and progression were based on bone marrow examinations or the results of serial laboratory tests. PFS was defined as the duration from the date of initial diagnosis to the date of an event or last follow-up. The median follow-up was 665 days (50-3272 days). An observation was considered right censored and analyzed accordingly if the subject did not have an event at the end of follow-up or was lost to follow-up.
Age was divided into three ranges by tertiles with cutoff points at 58 and 67 years. MTV was categorized as low, moderate, and high based on cutoff values that best discriminated PFS when explored by the surv_cutpoint function in the survminer package of R software. MTV was also divided into three ranges with cutoffs of 22.0 and 83.6 cm 3 .
Kaplan-Meier analysis with log-rank tests and univariate Cox proportional hazards models with hazard ratios (HRs) were used to evaluate the prognostic power of the major variables: sex, age, R-ISS stage, chemotherapy regimen, radiotherapy, ASCT, hypercalcemia, renal insufficiency, anemia, and the presence of bone lesions or EMD. Unsupervised cluster, supervised cluster, and MTV were also included in the survival analysis. Furthermore, multivariate Cox analysis was performed using variables with log-rank test P values < 0.05 in the univariate www.nature.com/scientificreports/ analysis. Because of potential multicollinearity among the unsupervised cluster, supervised cluster, and MTV groups, the multivariate survival analysis was repeated while including just one of those three variables at a time.

Results
Demographic data. The clinical characteristics and demographics of the study subjects are described in Clustering based on extracted features. Supervised and unsupervised clustering were performed on the 3072 features extracted by the convolutional autoencoder. The pattern of clustering produced by the supervised and unsupervised methods was similar. Figure 2 illustrates the clusters as PCA plots. Both the supervised and unsupervised clustering sorted 75, 63, and 21 subjects into clusters A, B, and C, respectively. Representative FDG PET MIP images of five subjects in each cluster are displayed in Fig. 3. Those subjects were sorted into the same clusters by both the supervised and unsupervised methods. The subjects in clusters A, B, and C displayed an increasing extent of high FDG uptake in that order. Thus, SUVmax, SUVmean, MTV, and TLG were signifi-

Prognostic validation of clustering. Kaplan-Meier survival analysis demonstrated that R-ISS stage,
ASCT, unsupervised clusters, supervised clusters, and MTV were significant predictors of PFS (Fig. 6). Specifically, R-ISS stage III was associated with worse PFS than stage I (P = 0.003) and II (P = 0.009), but there was no significant difference between R-ISS stages I and II. Similarly, both supervised and unsupervised cluster C were associated with worse PFS than clusters A and B, but there was no significant difference between clusters A and B. High MTV was associated with worse PFS than low or moderate MTV (both P < 0.001), with no significant difference between low and moderate MTV. Univariate Cox regression analysis showed that age, R-ISS stage, anemia, chemotherapy regimen, ASCT, unsupervised cluster, supervised cluster, and MTV were significant prognostic factors for PFS (Table 2). Multivariate Cox regression analysis was performed with those significant univariate prognostic factors but including only one of supervised cluster, unsupervised cluster, or MTV at a time (Table 3). Due to multicollinearity with ASCT history, chemotherapy regimen was not included in the analysis. Those results reveal that ASCT was a significant independent prognostic factor in all instances (all P < 0.001). In addition, unsupervised cluster C (HR = 2.68; P < 0.001), supervised cluster C (HR = 2.42; P = 0.002), and MTV (HR = 2.01; P < 0.001) were independent prognostic factors when they were individually included in the analysis (Table 3).
Given the strong prognostic association of ASCT, we performed an additional survival analysis in subgroups according to this variable. In subjects who had not received ASCT, the Kaplan-Meier analysis demonstrated that unsupervised cluster C, supervised cluster C, and high MTV were significant predictors of worse PFS (Supplementary Fig. 2). In that subgroup, Cox regression analysis showed that R-ISS stage, unsupervised cluster, supervised cluster, and MTV were significant univariate prognostic factors for PFS (Supplementary Table 1). Multivariate Cox regression analysis demonstrated that unsupervised cluster C (HR = 3.36), supervised cluster C (HR = 4.07), and MTV (HR = 2.83) were independent prognostic factors (Supplementary Table 2).
In contrast, none of those variables was significantly associated with PFS in subjects who received ASCT ( Supplementary Fig. 2).

Discussion
In this study, we have shown that autoencoders can successfully extract image features from FDG PET/CT scans of MM patients. The supervised and unsupervised cluster groups contained different patterns of bone metabolic activity as demonstrated by SUV, MTV, and TLG measurements. The Kaplan-Meier survival analysis and Cox regression models revealed that both the supervised and unsupervised clusters were significantly associated with patient outcomes.
FDG PET/CT provides high-contrast whole-body images of glucose metabolism and is a robust diagnostic modality for staging and treatment response assessment in various malignant diseases 14 . In patients with MM, FDG PET/CT has higher sensitivity for bone lesion detection than conventional radiography and offers findings concordant with magnetic resonance imaging (MRI) 15,16 . Moreover, the bone lesions and EMD detected by this modality are a prognostic predictor of survival outcomes 5,17 . Previous studies have indicated that estimations of www.nature.com/scientificreports/ tumor burden made from FDG PET/CT images can be used for risk stratification. Thus, MTV and TLG, which represent the tumor burden, can provide particularly useful prognostic information about MM patients 6,18 . Selecting the bone on FDG PET/CT images to measure the tumor burden in the skeletal system can be achieved automatically and precisely with the aid of artificial intelligence (AI). We are aware of only two previous studies that investigated the application of AI to PET/CT analyses in MM patients. One was the study of Xu et al. who used AI to detect bone lesions on Ga-68 Pentixafor PET/CT images 19 . The other was by Mesguich and coworkers, who used machine learning during model construction to diagnose diffuse infiltrative disease with radiomic FDG PET/CT parameters. However, their study did not use AI for feature extraction 20 . To the best of our knowledge, this study is the first to use AI to extract image features from FDG PET/CT scans of MM patients.
In this study, we successfully extracted image features from FDG PET/CT scans using a convolutional autoencoder. Convolutional autoencoders have several advantages for analyzing medical images such as CT, MRI, and FDG PET/CT that generate large data files with many pixels and thus require data compression of essential features for further analysis. Radiomic features are generally an excellent way to represent and characterize the patterns in medical images 21 . However, it is difficult to compress and reproduce input data via hand-crafted radiomic parameters such as texture parameters 22 . In this study, an AI algorithm was utilized to extract image feature with superior reproducibility and low redundancy. Among different feature extraction AI algorithms, convolutional autoencoders have been shown to provide better classification accuracy compared with other methods 23 and were thus employed in our study. www.nature.com/scientificreports/ Unsupervised clustering of FDG PET image features with k-means algorithms was confirmed to be a straightforward method that could be routinely applied in clinics. In this study, we also tested supervised clustering with the survClust algorithm and successfully provided outcome-weighted clustering of survival data 12 . The supervised algorithm weighs each feature by incorporating a vector of the Cox regression hazard ratio. Clustering membership was explored by integrating patient outcome-weighted distance matrices, and cross-validation was performed to achieve more reproducible clustering results without overfitting. Although this algorithm was originally introduced to analyze genomics data and find prognostic molecular subtypes, imaging data and genomics data share the common aspect of high-dimensionality, so image data can also be analyzed with this algorithm. To our knowledge, this study is the first actual attempt to use survival outcome-weighted supervised clustering with imaging data. Our results show that the supervised and unsupervised clustering produced highly similar grouping patterns in both the PCA plot and survival analysis results. At the same time, our results indicate that supervised clustering can be extended to imaging data. Compared with genomics data, applying clustering to imaging features could be considered limited by the low explainability of black-box AI algorithms. In that regard, supervised clustering for radiomic data could have the advantage of better interpretability than the unlabeled features extracted by unsupervised AI algorithms.
In our results, patients belonging to cluster C had significantly worse PFS than those in the other clusters. This is consistent with the high tumor burden in the FDG PET/CT images of the patients in cluster C, which is known to be a poor prognostic factor in MM. Interestingly, the patients in cluster B showed a moderate level The logarithms of SUVmax, SUVmean, MTV, and TLG differed significantly between cluster groups (all P < 0.001). All parameters were significantly greater in patients in cluster C than in the other clusters (all P < 0.001, except for SUVmean [P ≤ 0.01]). In addition, SUVmax, MTV, and TLG were significantly greater in patients in cluster B than in cluster A (P ≤ 0.05). ns not significant; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001. SUVmax, maximum standardized uptake value; MTV, metabolic tumor volume; TLG, total lesion glycolysis; SUVmean, mean standardized uptake value. www.nature.com/scientificreports/ of tumor burden in their bones, but they did not have worse PFS than those in cluster A, who had the lowest skeletal tumor burden. MM patients with high MTV on FDG PET/CT are recognized as having a high risk of disease progression 6,18 . Our study also demonstrates significantly worse outcomes in patients with an MTV > 83.6 cm 3 . This is not remarkably different from the previously proposed optimum MTV cutoffs of 42.2 and 56.4 cm 3,6,18 . In our study, patients with moderate MTV did not show significantly worse PFS compared to those with low MTV, and even appeared to have a tendency for better long-term prognosis. Although the precise reason for this is not clear, it might suggest that MTV alone has limited accuracy for stratifying long-term risk of relapse unless tumor burden is sufficiently high.
In addition to image feature clusters and MTV, we have demonstrated a strong association between worse PFS and patients who did not receive ASCT, which is included in the standard of care for young patients with newly diagnosed MM. This is consistent with the finding of previous studies 24,25 . In that subgroup of patients, cluster C and high MTV remained significant independent predictors of poor outcomes. These results imply that for MM patients who do not receive ASCT, a high tumor burden with cluster C features and high MTV indicate a high risk of disease progression or relapse, so those patients might require follow-up at shorter-than-average intervals.
The PCA plots do not display obviously distinct visual patterns between cluster groups. The MIP images of subjects in each cluster demonstrate a greater extent of high FDG uptake, but they do not show other cluster group-specific image patterns. Thus, the AI algorithms might be limited in discerning differences in image patterns, other than the extent of high FDG uptake, that characterize MM patients at high risk.
In this study, we also tried repeating our clustering analysis using images showing only the bones with SUV above 2.5 as input data to test the effect of excluding bone marrow with low FDG uptake from the input data. Figure 5. Comparison of conventional FDG parameters according to clusters obtained by the supervised analysis. The logarithms of SUVmax, SUVmean, MTV, and TLG differed significantly between cluster groups (all P < 0.001). All parameters were significantly greater in patients in cluster C than in the other clusters, and in patients in cluster B compared with those in cluster A. ns not significant; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001. SUVmax, maximum standardized uptake value, MTV metabolic tumor volume, TLG total lesion glycolysis, SUVmean mean standardized uptake value. Although MTV demonstrated prognostic value that was comparable to that provided by cluster analysis, there are advantages to using the clustering analysis with autoencoder-extracted features. First, whereas MTV measurement depends on selection of an optimum SUV threshold, the clustering analysis is performed on the whole skeleton with a constant anatomical reference. Second, the autoencoder-extracted features potentially contain greater abundance of information about image features that are not provided by simple MTV measurements.
This study has several limitations. One is that 2-dimensional anterior view MIP images were used as autoencoder input, rather than 3-dimensional image data, whose massive size is difficult to handle. Further investigations of clustering analyses of 3-dimensional image data will thus strengthen our findings in this study. In addition, PET images limited to the torso were used. However, most osseous MM lesions are located between the skull base and upper thigh. Finally, clustering was performed with images of only the bones and did not include EMD lesions, but our results show that the presence of EMD was not associated with PFS.  www.nature.com/scientificreports/

Data availability
The data generated in this study are available upon request from the corresponding author.