Introduction

Multiple myeloma (MM) and its precursor conditions smoldering myeloma and monoclonal gammopathy of unknown significance (MGUS) outline a continuous spectrum of monoclonal plasma cell disorders. Most cases of MM are preceded by the asymptomatic, premalignant disorders smoldering myeloma or MGUS, which are commonly diagnosed incidentally in patients presenting with other conditions [1]. MM is the second most common hematologic malignancy and primarily manifests in elderly patients (median age at diagnosis 66–70 years, age-standardized incidence 5 cases per 100,000 in the Western World) [2]. Standard work-up for diagnosis of plasma cell disorders includes laboratory testing, bone marrow (BM) biopsy of the iliac crest, and whole body imaging [3]. BM biopsy is required to evaluate the degree of plasma cell infiltration, while diagnostic imaging is employed to detect myeloma defining bone lesions [3].

Discrimination of MM against its premalignant conditions is crucial for patient management and prognosis, since MM obligates for a specific therapy. Vice versa, according to the latest recommendations, MGUS is managed in a “watch-and-wait” strategy [1, 4, 5]. Diagnosis of MM by International Myeloma Working Group (IMWG) guidelines demands a BM biopsy in almost all cases [3]. In contrast to the other obligatory procedures, BM biopsy still is a painful and uncomfortable experience for most patients [6, 7]. Despite IMWG recommendations, a recent large-scale clinical analysis challenged the obligation of regular invasive BM diagnostic for patients with evidence of monoclonal antibody, since in most cases it did not contribute to the diagnosis [5].

In dual-energy computed tomography (DECT), two imaging datasets with different energy spectra are achieved during a single acquisition. The disparity of attenuation between both datasets enables post-processing of DECT images with virtual removal of certain materials. This method is especially effective for the removal of materials with high atomic numbers, e.g. calcium, and subsequent creation of virtual non-calcium images (VNCa) [8, 9]. VNCa post-processing is based on a three-compartment model, which constitutes the total attenuation of the BM in non-contrast-enhanced CT images to fat, soft tissue, and bone mineral [10]. By virtual removal of the bone mineral content, the fatty and soft tissue portion of BM attenuation can be estimated [9, 11]. By applying this technique, DECT VNCa imaging yielded a similar performance for detection of MM lesions as compared to the gold standard MRI [9, 12,13,14] and achieved an excellent prediction of metabolic activity, when compared to the benchmark PET/CT [15].

In line with recent clinical studies questioning obligatory initial BM biopsy, and considering the promising recent results of first reports on VNCa imaging for the assessment of MM, our study had two objectives: first, to demonstrate the feasibility of artificial intelligence (AI) supported, automated assessment of VNCa data to investigate its association to BM infiltration by pathology results. The second objective was to explore cutoffs based on the quantitative analysis of BM attenuation in VNCa images for the presence of osteolytic lesions, as determined by radiology report of conventional CT images, and the clinical diagnosis of MM.

Materials and methods

All procedures performed in studies involving human participants were conducted in accordance with the ethical standards of the institutional (number 20–1480) and national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was waived due to retrospective study characteristics.

Patient enrollment

Inclusion criteria to our study comprised:

  1. 1)

    Whole-body low-dose CT according to the IMWG specified imaging protocol and concurrent diagnostic BM biopsy,

  2. 2)

    No history of specific therapy for MM to the day of CT and BM biopsy,

  3. 3)

    Evidence of monoclonal protein,

  4. 4)

    Imaging between May 2018 and July 2020,

  5. 5)

    Patient age > 18 years.

Exclusion criteria were:

  1. 1)

    History of secondary malignoma with requirement for specific therapy or bone involvement (n = 3),

  2. 2)

    Metal implants in the spine (n = 2).

Assessment of clinical data

The quantitative infiltration rate of BM biopsies as per pathology report was noted. Each patient was assigned to the MGUS, smoldering myeloma and MM group based on IMWG recommendations.

DECT image acquisition

All scans were performed on a commercially available spectral detector DECT scanner (IQon Spectral CT, Philips Healthcare), following the most recent recommendations of the IMWG [16]. Scans were unenhanced. Scan parameters were as follows: tube voltage 120 kV; tube current 70 mAs; collimation 32 × 0.625 mm; pitch 0.908; volumetric computed tomography dose index 7.4 mGy. Mean dose length product was 1091.7 ± 230.5 mGy*cm.

DECT image reconstruction

All images were constructed in a 512 × 512 matrix, slice thickness 2 mm with an overlap of 1 mm. VNCa images were created by the vendor’s software in order to simulate each voxel’s attenuation in Hounsfield units (HU) without the calcium-specific contribution (IntelliSpace Portal, Spectral Diagnostics Suite, Philips Healthcare) [17]. In our study, calcium suppressed images were calculated with a high suppression index (index 25), as suggested by earlier results [15]. Detailed information about VNCa imaging has been provided in earlier studies [8, 18].

Segmentation of the BM and assessment of DECT data

All 35 three-dimensional CT datasets were roughly cropped to a longish cuboid containing the thoracolumbar spine. Automated segmentation of the spine was achieved by a pre-trained convolutional neural network [19, 20]. Seventeen vertebrae counting from bottom upwards were marked as a volume of interest (VOI) by a Python script. The SciPy command “scipy.ndimage.binary_erosion” was executed by 3 mm in order to exclude the bordering cortical bone from our VOIs, which does not contain BM (Fig. 1, panel c) [21, 22]. The VOI outlining 17 vertebrae was then automatically applied to the VNCa dataset. Our method did not require specific user interaction. Visualization was realized by the open source software 3D Slicer [23, 24].

Fig. 1
figure 1

Step-by-step automated segmentation of bone marrow. a The monoenergetic CT data served as an input to the pretrained convolutional neural network by Payer et al. b The output was limited to 17 vertebrae by a Python script, counting from the most bottom one. Segmentation after the first step included the bordering cortical bone, which was excluded by the “Shrink margin” command of 3D Slicer (c). The segmentation was then applied to the virtual non-calcium dataset and used as a mask to obtain a dataset of mere bone marrow attenuation (d)

A histogram of attenuation (range: − 1024 HU—+ 3071 HU, bin width 5) was extracted from this three-dimensionally segmented VNCa spinal cord section for each patient. BM attenuation on VNCa images was visualized in a histogram for MM and MGUS patients after standardization of the VOI size to 336.0 cm3.Batch processing of 35 CT datasets took about 3.5 h on a standard desktop computer (processor: Intel ® Core™ i9-9980HK CPU with 2.4 GHz clock frequency).

Analysis of the attenuation histograms

After virtual removal of the bone mineral portion during VNCa post-processing, the remaining BM attenuation consists of the fatty and soft-tissue portion, as introduced by the three-compartment model above. In order to estimate the relative amount of fatty and soft-tissue compartments, the number of voxels > 0 HU in the VNCa data was divided by the total volume of the segmented BM, resulting in the non-fatty attenuating portion of BM for each patient. This threshold of 0 HU was adopted from earlier studies for discrimination of physiological and infiltrated BM in MM [9]. The peak of BM attenuation below − 1000 HU apparently resulted from calcium removal from densely calcified structures, e.g., cortical bone or bone islands, which do not make up the BM space. Therefore, for calculation of the total volume of BM, only voxels with attenuation >  − 1000 HU were considered.

Texture analysis of VNCa BM images

A supplementary radiomics analysis was performed by extracting textural features of the VNCa post-processed BM space, which is illustrated in detail in supplementary data 1 [25].

Bone mineral density measurements

Bone mineral density (BMD) was quantified by a CE-certified software for phantom-less, in-body calibrated measurements (IntelliSpace, Philips Healthcare) [26,27,28,29]. BMD measurements were performed on the first to third lumbar vertebrae, as outlined in the software’s manual. In case of focal osteolytic bone lesions or vertebral fractures, measurements were extended to the lower vertebral spine (n = 3 patients). Density measurements in the paravertebral muscles (erector spinae muscle) and the subcutaneous fat tissue served as a spectrometric calibration. BMD measurements were repeated by two independent, blinded radiologists with 3 and 4 years of experience to assess inter-reader agreement.

Visual assessment of CT data

Conventional CT images were independently screened by three blinded radiologists (3, 3, and 4 years of experience) for myeloma defining osteolytic lesions, as outlined by the IMWG [16]. Each CT was declared positive or negative for myeloma defining osteolytic lesions by majority vote.

Statistical assessment

Statistical analysis was performed in R language for statistical computing, R Foundation, version 4.0.0. Shapiro–Wilk’s test was performed to test the data for normal distribution, using the R library dplyr [30]. Receiver operating characteristic analysis was carried out by the R library pROC with the predictor “non-fatty portion of BM in VNCa” and the binary outcomes “evidence of at least one MM defining osteolytic lesion” and “clinical diagnosis of MM by IMWG criteria” [31].

Inter-rater reliability of BMD measurements was reported by the intraclass correlation coefficient (ICC) in a single rater type, two-way random-effects model (ICC2), using the R library irr (supplementary data 2) [32, 33].

Appropriate sample size of the multivariate regression was calculated by the software G*Power for a desired power level of 0.8, significance level of 0.05, and an estimated medium to large effect size (f2 = 0.32) of the main independent variable, based on reported data (supplementary data 3) [34, 35].

Statistical significance was defined as p ≤ 0.05. Data is stated as mean ± standard deviation, if not otherwise specified.

Results

Patient enrollment resulted in a study population of 35 individuals, after exclusion of five patients (Fig. 2). Mean age was 64.6 ± 12.4 years. Eighteen patients were female, and 17 male. Twenty patients initially presented with MM, 14 with MGUS, and one smoldering myeloma. For the subsequent analysis, the patient with smoldering myeloma was included in the MM subgroup, since BM Infiltration was above the MGUS threshold (≥ 10%). BM biopsy and DECT scan were separated by a median of 4.5 days [interquartile range 1.0–12.5 days]. In the MM subset, median BM infiltration was 40% [interquartile range 12.5–70.0] and median BMD was 96.8 mg/ml [interquartile range 89.6–110.4]. Throughout MGUS patients, median BM infiltration was 0% [interquartile range 0–0] and median BMD was 95.6 mg/ml [interquartile range 82.7–106.1]. Patient characteristics are summarized in Table 1.

Fig. 2
figure 2

Inclusion chart of the study population

Table 1 Patient characteristics

Results of automated segmentation

In 34 out of 35 patients, 17 consecutive vertebrae were identified by the neural network and our Python algorithm. In one patient, instead of the second thoracic vertebra, the seventh cervical vertebra was segmented. In 29 out of 35 patients, the thoracolumbar spine was segmented anatomically correct (Th1-L5). In four patients, segmentation started at the second thoracic vertebra (Th2-S2), apparently due to lumbosacral transitional vertebrae (Fig. 3). In one case, segmentation started at the seventh cervical vertebra (C7-L4). The segmentation results were not manually corrected, in order to prove feasibility of the automated workflow of our method and enable batch processing. Mean volume of the segmented BM space was 336.0 ± 99.3 cm3. Since our study population comprised patients at first presentation, there was no advanced stage of spinal destruction due to myeloma bone disease. Even though the neural network was trained with healthy individuals only, the automated segmentation process respected the present spinal osteolytic lesions (Fig. 3) [19, 20].

Fig. 3
figure 3

Results of the automated segmentation of the thoracolumbar spine. Exemplary sagittal and axial slices of non-contrast CT with green overlay of AI-supported segmentation of three patients with evidence of monoclonal antibody (a, b and c/d). In 34 out of 35 patients, 17 consecutive vertebrae were segmented. In 29 patients the thoracolumbar spine was annotated correctly (a). The most common misclassification was the segmentation of Th2-S1 instead of Th1-L5, due to lumbosacral transitional vertebrae (four patients, b). Segmentation results were not manually altered in order to exclude bias by specific user interaction throughout our automated method. The convolutional neural network by Payer et al. was trained with healthy individuals only. However, osteolytic bone lesions were well respected and spared from the segmentation (white arrowheads, c/d)

Analysis of the attenuation histograms

The peak of the MGUS histogram was higher (5349 vs. 4876 standardized voxels) and shifted towards negative HU values (− 215 HU vs. − 185 HU), when compared to the MM group (Fig. 4).

Fig. 4
figure 4

Histogram of bone marrow (BM) attenuation on virtual non-calcium CT images. Voxel counts of smoldering myeloma/multiple myeloma (MM) patients and patients with history of monoclonal gammopathy of unknown significance (MGUS) were plotted to the histogram. Automatically segmented BM was standardized to a volume of 336.0 cm3; thus, both plots cover the same area under the curve. The MGUS peak is higher and shifted towards negative attenuation, when compared to the MM peak. The non-fatty portion of the BM > 0 HU was larger throughout MM patients. The peak at the left end of both histograms can be explained by the virtual calcium removal, which results in extreme negative attenuation of structures with dense calcification. Further, the large overlap of both histograms in the range of negative HU values might reflect the resemblance of bone mineral density of both subsets: after virtual calcium removal, the bone mineral density might crucially influence the negative part of the attenuation histogram

The non-fatty portion of the BM (> 0 HU) on three-dimensional VNCa data ranged from 0.02 to 6.01%. The median portion of non-fatty BM (> 0 HU) in patients with evidence of osteolytic lesions on conventional CT (n = 13) was 1.33% [1.07–4.25%]. The corresponding portion for MM patients without osteolytic lesions (n = 7) was 1.18% [0.64–1.89%]. The non-fatty portion of BM in patients without evidence of osteolytic lesions was generally lower; however, this difference remained non-significant (Wilcoxon test, p = 0.41). To further investigate the relationship between the non-fatty portion of BM and BM infiltration, a multivariate regression analysis was conducted (Fig. 5). Since it is known that the BMD interacts with the vertebral bone marrow fat content and the BM infiltration in MM, it was included as a control variable [36, 37]. This ensures that the effect of the main independent variable, the non-fatty portion of BM, is not overestimated or driven by BMD. For modelling of the regression, the dependent variable was logit transformed to account for its bounded nature [38]. The regression results showed that non-fatty portion of BM is a significant predictor of the BM infiltration (p = 0.007, r = 0.46). The effect of the control variable BMD was not significant (p = 0.30). Variance inflation factors below 1.5 excluded collinearity among predictor variables in the model [39]. A Breusch-Pagan test indicated the presence of heteroskedasticity in the residuals (p < 0.05). This problem has been addressed by the use of robust standard errors according to White [40].

Fig. 5
figure 5

Multivariate regression of the non-fatty portion of bone marrow in virtual non-calcium CT and the biopsy-determined bone marrow infiltration. The association was highly significant (p = 0.007) and moderately strong (r = 0.46), after the inclusion of bone mineral density as a control variable. The regression line of the model-based analysis is plotted in black with a gray band marking the 95% confidence interval

The association of the non-fatty portion of BM with the degree of BM infiltration by pathology report is illustrated in Fig. 6.

Fig. 6
figure 6

Spine infiltration by multiple myeloma (MM) displaces the fatty bone marrow (BM). Four patients (ad) with parallel dual-energy CT (DECT, top row) and CD138-immunostained BM biopsy in 200 × amplification (bottom row) are presented as an example. The thoracolumbar spine was automatically segmented by a convolutional neural network. The red overlay on the DECT slices marks voxels with attenuation > 0 HU in virtual non-calcium (VNCa) post-processed images. Percentage of BM infiltration, as determined by biopsy, was 0–5%, 10–15%, 60–70%, and 90% for patients ad, respectively. With rising BM infiltration, an increase of the non-fatty attenuating portion of BM on VNCa images is visually assessable (larger patches of red overlay) and measurable (0.3%, 0.4%, 4.3%, and 5.4% for patients ad, respectively). Correspondingly, the histological images demonstrate an expansion of CD138 + , brownish stained plasma cells and a displacement of the translucent, fatty vacuoles. We hypothesize that these histological findings correspond to the raise of attenuation, which we observed on VNCa BM data

Prediction of myeloma defining lesions and the clinical diagnosis of MM by whole-spine BM attenuation

Image interpretation of the conventional CT images by three experienced radiologists found evidence of myeloma defining osteolytic lesions in 13 out of 35 patients. Thirty-one readings resulted in univocal results of the three readers; four cases were determined by majority vote.

We performed a receiver operating characteristic analysis with the predictor “non-fatty portion of BM in VNCa” and the binary outcome “evidence of at least one MM defining osteolytic lesion.” Here, the area under the curve was 0.70 [0.49–0.90]; maximum sensitivity and specificity were 0.85 (11/13) and 0.59 (13/22, Youden), respectively, applying a cutoff of non-fatty BM portion of 0.93%. However, the power level of the ROC analysis for a desired significance level of 0.05 was 0.53, only. Area under the ROC curve for prediction of the “clinical diagnosis of MM” by the “non-fatty portion of BM in VNCa” was 0.71 [0.54–0.89]. Here, the power for a desired significance level of 0.05 was 0.59. The best threshold by Youden’s method was again 0.93%. From patients that did not show myeloma defining bone lesions on conventional CT images, but who were clinically diagnosed with MM (n = 8), an above threshold non-fatty portion of BM in DECT could identify five (> 0.93, sensitivity 0.63, 5/8, specificity 0.71, 10/14).

Discussion

In order to restrict BM biopsy in patients with evidence of monoclonal antibody to the essential and obligatory minimum, we investigated VNCa imaging in DECT to non-invasively estimate BM infiltration. Automated segmentation of the BM space yielded visually excellent results. A subsequent, histogram-based analysis was capable to demonstrate a positive association between the portion of non-fatty attenuation on VNCa images and the biopsy-determined BM infiltration, remaining significant after adjusting for BMD as a control variable (p = 0.007, r = 0.46). BMD was not significantly different between the MM and MGUS subset, which we account to the early timepoint of assessment of therapy naïve patients without advanced osteolysis (Wilcoxon test, p = 0.49). Introducing a fully automated solution, our approach offers possibilities for clinical use without depleting resources for manual spine segmentation or observation by trained personal. Since DECT datasets were obtained according to IMWG guidelines, the presented technique does not require higher economic cost or radiation exposure [16].

Non-invasive estimation of BM infiltration might not only help to avoid unnecessary BM biopsy of MGUS patients, but also validate BM biopsy of MM patients, which is prone to sampling error due to the patchy morphology of MM bone infiltration [41]. While a typical BM biopsy by a 15G needle obtains a 1-cm, cylindrical tissue sample of approximately 0.003 cm3, the average volume of BM space that was analyzed in our approach is 336.0 cm3, which might be more representative of the BM infiltration status. BM biopsy is further biased by practical limitations, such as operator experience, which is most likely ruled out by the automated method of our study [42].

Further, we observed a trend that an above threshold portion of non-fatty BM might preselect patients with higher pre-test probability of myeloma defining bone lesions and clinical diagnosis of MM. This part of our analysis, however, lacked the desired statistical power level due to a limited sample size.

A recent study correlated BM texture analysis in dual-energy CT (DECT) with BM infiltration in MM [34]. The study extracted 92 PyRadiomics features for each of 56 patients, which yielded six significant features for Pearson’s correlation with BM infiltration. In our study, the same 92 features were calculated in a supplementary analysis (supplementary data 1), resulting in two significant descriptors. However, only one descriptor was significant in both studies (“ngtdm_Contrast”), which demonstrates the common problem of reproducibility and generalizability for radiomics studies with inappropriately small sample sizes [43, 44]. On the other hand, our data suggest a significant association, which was investigated hypothesis-driven and based on biological causalities. We observed a decreasing portion of fatty BM in contrast to an increasing fluid-like and soft tissue attenuation of BM in MM, which is well-known on a microscopic level, since in MM fatty BM is displaced by plasma cells [45]. Our observation is in line with recent literature, suggesting VNCa imaging to estimate the degree of displacement of fatty BM in MM [13].

There are some limitations that need to be discussed. First, our cutoff for discrimination of pathologic BM was defined at 0 HU, which is motivated by the hypothesis that healthy, fatty BM is displaced by soft tissue– and fluid-like attenuating infiltration. A recent study found a similar cutoff between − 3 HU and 4 HU [9], while another research proposed − 44.9 HU for identification of infiltrated BM in MM [12]. The authors claim that the discrepancy most likely arises due to technical aspects [12]. However, both studies relied on subjective and semi-objective (manual region of interest measurements) analysis of the VNCa images. Since VNCa post-processing is still not regularly established in clinical practice, most radiologists have limited experience in assessment of VNCa data, which might introduce an inter-reader bias. Hence, our quantitative results are limited to the specific scanner and reconstruction used, while the general methodology is considered reproducible. In this context, a multi-center, multi-vendor study would be of interest, yet, was out of scope of this investigation. The multivariate regression plot presents several outliers in the top left quadrant, which precluded prediction of absolute changes in biopsy-determined BM infiltration per increase of BM attenuation. To elaborate on the highly significant regression, a larger study population might allow to limit its confidence intervals for absolute predictions.

Our AI-supported algorithm segmented the spine into 17 vertebrae in all cases, while some misclassifications occurred as described above. Since MM is a systemic disease and we did not expect a significant bias by incorrect anatomic numeration of the vertebrae (e.g., segmentation of T2-S1 instead of T1-L5 in case of a lumbosacral transitional vertebra), there was no necessity to interrupt the batch processing for manual alteration. Last, we included a rather small number of patients as we wanted to study treatment-naïve patients with concurrent BM biopsy, only. Hence, our feasibility study does not allow for specific conclusions about the subset of smoldering myeloma patients, since only a single case was examined. Further, performance of the threshold-based analysis for identification of MM in patients with negative conventional CT was poor (sensitivity 0.63, specificity 0.71), possibly outlining a limitation of our method to assess subtle, non-lytic BM infiltration; however, final evaluation requires a larger sample size for further examination.

Concluding, our work demonstrates the feasibility of an automated, AI-supported method to non-invasively estimate the degree of BM infiltration in MM and its premalignant conditions. In line with the recent clinical trend to question BM biopsy at first presentation of patients with evidence of monoclonal protein, we propose a tool for clinical decision support to avoid unnecessary invasive BM diagnostic.