Patient-specific finite element computer models improve fracture risk assessments in cancer patients with femoral bone metastases compared to clinical guidelines.

PURPOSE
To determine whether patient-specific finite element (FE) computer models are better at assessing fracture risk for femoral bone metastases compared to clinical assessments based on axial cortical involvement on conventional radiographs, as described in current clinical guidelines.


METHODS
Forty-five patients with 50 femoral bone metastases, who were treated with palliative radiotherapy for pain, were included (64% single fraction (8 Gy), 36% multiple fractions (5 or 6 x 4 Gy)) and were followed for six months to determine whether they developed a pathological femoral fracture. All plain radiographs available within a two month period prior to radiotherapy were obtained. Patient-specific FE models were constructed based on the geometry and bone density obtained from the baseline quantitative CT scans used for radiotherapy planning. Femoral failure loads normalized for body weight (BW) were calculated. Patients with a failure load of 7.5 x BW or lower were identified as having high fracture risk, whereas patients with a failure load higher than 7.5 x BW were classified as low fracture risk. Experienced assessors measured axial cortical involvement on conventional radiographs. Following clinical guidelines, patients with lesions larger than 30 mm were identified as having a high fracture risk. FE predictions were compared to clinical assessments by means of diagnostic accuracy values (sensitivity, specificity and positive (PPV) and negative predictive values (NPV)).


RESULTS
Seven femurs (14%) fractured during follow-up. Median time to fracture was 8 weeks. FE models were better at assessing fracture risk in comparison to axial cortical involvement (sensitivity 100% vs. 86%, specificity 74% vs. 42%, PPV 39% vs. 19%, and NPV 100% vs. 95%, for the FE computer model vs. axial cortical involvement, respectively).


CONCLUSIONS
Patient-specific FE computer models improve fracture risk assessments of femoral bone metastases in advanced cancer patients compared to clinical assessments based on axial cortical involvement, which is currently used in clinical guidelines.


Introduction
Patients with bone metastases carry a risk of pathological fractures. [1][2][3] If a pathological fracture occurs in a weight-bearing bone such as the femur, this leads to an immediate decrease in the patient's mobility and self-care, and as a result in a reduced quality of life and possibly shortened survival [4,5]. Treatment of bone metastases in intact femurs is, therefore, based on the expected fracture risk [6,7]. Patients with an expected low https://doi.org/10.1016/j.bone.2019.115101 Received 10 July 2019; Received in revised form 2 October 2019; Accepted 2 October 2019 fracture risk are treated with radiotherapy, usually a single fraction (SF) of 8 Gy, to relieve pain, whereas patients with an expected high fracture risk are considered firstly for preventive stabilizing surgery. In case of an expected high fracture risk, but the patient refuses surgery, radiotherapy in multiple fractions (MF) will be given, with the goal to prevent a pathological fracture by inducing remineralization [6][7][8].
Currently, fracture risk assessment is based on available imaging such as conventional radiographs and CT scans, on which lesion characteristics like size [7,9,10] and radiographic appearance [7,[9][10][11] are measured. Mirels et al. [11] developed a scoring system that is widely used for fracture risk assessment. This score combines ratings of pain, lesion type, size and location, and runs from 4 to 12. Generally, a patient should be considered for surgery if the Mirels' score is 9 or higher [11]. However, it is known that this score is very conservative and results in large numbers of overtreatment (positive predictive value (PPV) of 14%, negative predictive value (NPV) of 100%). As a consequence, patients who would never have developed a fracture during their remaining lifetime undergo surgery [7].
Fracture risk can be assessed by measuring axial cortical involvement of the metastatic lesion on conventional radiographs, [6] which has shown to be more accurate compared to Mirels' scoring system [7]. Recently, the accuracy of the 30 mm threshold of axial cortical involvement was validated using 100 patients with 110 femoral bone metastases [12]. The negative predictive value (NPV) of the 30 mm threshold was high (96-97%), indicating that the 30 mm threshold was very accurate for ruling out pathological fractures. However, the positive predictive value (PPV) was limited (20-23%), which means that only one of four or five patients who were identified as high risk indeed fractured their femur during follow-up, indicating substantial overtreatment [6,12].
Since actual pathological fractures result in higher morbidity and mortality [4,5] and are associated with longer hospital stays and higher costs [13] compared to prophylactic surgery of impending lesions, surgical overtreatment is generally accepted. However, unnecessary invasive treatments should be prevented as much as possible since this results in additional costs, hospitalisation and a certain risk of complications, especially in cancer patients with often poor general clinical condition and limited life expectancy. Hence, there is still room for improvement and a need to develop a more accurate fracture risk assessment tool.
A patient-specific finite element (FE) computer model based on quantitative CT scans (QCT) is a promising tool for fracture risk assessment. [14][15][16][17][18][19][20] In a recent cohort study [21], we showed that the fracture risk assessments of the FE models were superior to those of experienced clinicians that assessed fracture risk in a test set-up on digitally reconstructed radiographs (DRRs). A limitation of that study was the poor visibility of the metastases on the DRRs. Therefore, the aim of the current study was to compare fracture risk assessments by FE computer models with fracture risk assessments based on axial cortical involvement on diagnostic radiographs as described in current clinical guidelines for cancer patients with femoral bone metastases.

Patients
Two multicentre prospective cohort studies were performed between August 2006 and September 2009, [21,22] and between January 2015 and April 2017, with the aim to investigate fracture risk assessment utilizing FE models in patients with femoral bone metastases. Specific inclusion and exclusion criteria have been discussed elsewhere [21,22]. In summary, patients with advanced cancer and referred for radiotherapy of bone metastases in the femur were asked to participate in four radiotherapy institutes in the Netherlands (Radboud university medical center, Nijmegen; Leiden University Medical Center, Leiden; Radiotherapeutic Institute Friesland, Leeuwarden; Bernard Verbeeten Institute, Tilburg). Ethical approval was obtained from all participating centres. Patients were treated with radiotherapy according to the current clinical guidelines [6,7]. Lesions with an axial cortical involvement ≤ 30 mm were treated with 8 Gy SF. Lesions with an axial cortical involvement > 30 mm were considered for prophylactic stabilizing surgery. However, if the patient's condition was too poor and surgery was undesirable or impossible, the patient received MF radiotherapy (e.g. 5 or 6 fractions of 4 Gy) to induce remineralisation of the bone. [23] If a patient was too ill to travel to the radiotherapy department for multiple fractions, it was accepted to deviate from the treatment guidelines and apply 8 Gy SF. Patients who were referred for surgery were not included in this study. In total, 156 patients gave informed consent. Patients were followed for six months or until a fracture occurred or until death, whichever occurred first.
Baseline characteristics (sex, age, primary tumour, radiotherapy schedule, pain (on a scale from 0 to 10), and Karnofsky performance status (KPS, on a scale from 0 to 100) [24]) were recorded prior to radiotherapy. QCT scans used for radiotherapy planning were made at baseline using a standardized protocol (120 kVp, 220 or variable mA, slice thickness 3 mm, pitch 1.5 or < 1, spiral and standard reconstruction, field of view (FOV) 480 mm, in-plane resolution 0.9375 mm). In thirteen patients, it was not possible to create FE models due to a hip or knee prosthesis (n = 5), an incompletely scanned femur (n = 7), or a missing calibration phantom (n = 1) (see Fig. 1 for the flow chart of patient inclusion for the current study). In a previous study it was shown that bone strength of femurs with osteoblastic lesions was overestimated [21], probably due to the fact that the empirically established FE material model is not valid for the highly mineralized (pathological) bone tissue in osteoblastic lesions. As a result, patients with predominantly osteoblastic appearance were excluded (n = 27) from the analysis. Also, patients were excluded if the calibration was affected by an air artefact (n = 43) [21], causing a shading artifact on the upper half of the calibration phantom and therefore resulted in unreliable calibration functions. Additionally, four patients were excluded because their body weight was missing from the clinical research files and three patients were excluded because they underwent preventive stabilizing surgery shortly after inclusion. In total, 66 patients (76 affected femurs) were included in the FE database.
For the current study, we included the patients from the FE database with conventional anteroposterior (AP) and/or lateral radiographs available within a two month period prior to radiotherapy. [12] Patients were excluded from the current study if no radiographs were available (n = 21) [12]. This resulted in inclusion of 45 patients, with 50 affected femurs ( Fig. 1). Twelve of these femurs were part of the study group published before [21].

FE models
Patient-specific femoral FE models were generated as described previously. [16,21] In short, for each irradiated femur, the three-dimensional geometry was segmented from the CT scan (Mimics 11.0 and 14.0, Materialise, Leuven, Belgium), and subsequently converted into a solid mesh of tetrahedral elements (Patran 2005r2 and 2011, MSC Software Corporation, Santa Ana, CA, USA). Additionally, the CT scans were calibrated with the use of a solid calibration phantom containing known calcium equivalent densities (Image Analysis, Columbia, KY, USA) that was scanned along with the patient. With the use of this calibration, Hounsfield units were converted to calcium equivalent values, which were used to calculate non-linear isotropic material behaviour for each tetrahedral element based on the material model of Keyak et al. [25] To correct for inter-scanner differences [26][27][28], crosscalibration using phantom scans (Gammex 467 phantom, RMI Gammex, Middleton, WI, USA) [27] was performed per scanner.
The FE models of the proximal femur were distally fixed at the knee joint centre using two high-stiffness springs (200.000.000 N/m) and loaded by incrementally displacing a cup on the femoral head in axial direction (Fig. 2   The workflow of generating the FE model. The CT scan is used to obtain the geometry, which is converted into a solid mesh. Additionally, the CT is calibrated using the calibration phantom to obtain bone mineral densities, which are used to calculate non-linear isotropic material behaviour for each tetrahedral element. The FE model is distally fixed at the knee joint centre (KJC) using two high-stiffness springs. Load was applied by displacing a cup on the femoral head in line with the hip joint centre (HJC) in axial direction.
Corporation, Santa Ana, CA, USA). During the FE simulation, incremental displacement and contact normal forces were recorded. Failure load of the femur was defined as the maximum total reaction force. [16,21,25] Failure loads were normalized for body weight (BW). The previously determined critical threshold of failure load of 7.5 × BW was applied to distinguish low from high fracture risk femurs: patients with a failure load of 7.5 x BW or lower were identified as having high fracture risk, whereas patients with a failure load higher than 7.5 x BW were classified as low fracture risk [21].

Clinical guidelines: 30 mm threshold of axial cortical involvement
Three experienced assessors (radiation oncologist, orthopaedic surgeon and skeletal radiologist) individually measured the axial cortical involvement on the available radiographs as described in earlier studies (Fig. 3). [6,12] According to current clinical guidelines, the femur was defined at high risk of fracture if the axial cortical involvement of the metastasis was over 30 mm [6]. In case of disagreement, consensus was reached through discussion between the three assessors.

Statistical analysis
Baseline characteristics were compared between the groups of patients who did and did not develop a pathological fracture during follow-up using Mann-Whitney U (age, pain, KPS), Fisher exact (sex, radiotherapy schedule, affected femur) and Pearson χ 2 (primary tumour) tests. FE predictions were compared to clinical assessments by means of diagnostic accuracy values: sensitivity, specificity, PPV and NPV. We compared sensitivities and specificities between FE and clinical assessments using McNemar's test [29].

Results
In total, 45 patients with 50 affected femurs were included in this study. Six patients developed seven fractures (14%). Median time to fracture was 8 weeks (range 1-18). Thirteen patients died during the 6 month follow-up (33%). There were no significant differences in baseline characteristics between patients who did and did not develop a femoral fracture (Table 1). Five out of the seven fractured femurs had been treated with 8 Gy SF. Examples of CT images of a few patients can be found in Supplementary Fig. 1.
According to the FE model, all seven fractures had a failure load below 7.5 x BW, and were accordingly correctly assessed as high risk, resulting in a sensitivity of 100% (Fig. 4, Table 2). Based on the 30 mm threshold, all but one of the fractures were correctly identified (sensitivity of 86%, difference of 14%, 95% confidence interval of -23% to 51%). Of the 43 non-fractured femurs, the FE model correctly predicted 32 femurs as low risk (specificity of 74%), whereas 18 non-fractured femurs were accurately assessed as low risk (specificity of 42%, statistical significant difference of 32%, 95% confidence interval of 14%-47%) by using the 30 mm threshold. The NPV of the FE model was 100%, demonstrating that none of the femurs with a low fracture risk as calculated by the FE model fractured. The NPV of the axial cortical involvement was slightly lower (95%). The PPV of the FE model was 39%, indicating that 39% of the femurs with a high fracture risk assessment in reality fractured. For the axial cortical involvement, the PPV was 19%.
There was no correlation between the femoral failure load corrected for bodyweight and the measured axial cortical involvement measured on conventional radiographs (Fig. 5).

Discussion
For assessment of expected fracture risk in patients with femoral bone metastases, this study compared FE computer models with axial cortical involvement on conventional radiographs as described in current clinical guidelines. FE models were better at assessing fracture risk in comparison to the clinical guidelines: the fracture risk of more femurs, either high or low, was correctly assessed. Clinically, fracture risk is estimated based on radiographs or CT scans, and FE models are currently not used. Clinical guidelines, such as the 30 mm threshold of axial cortical involvement, have been constructed to align the method for fracture risk assessment between different medical specialists. However, previous studies have shown that substantial numbers of patients are under-and overtreated based on such clinical guidelines. [6,7,12] Although promising, fracture risk assessment by the FE model  Pain was assessed at baseline using a pain score ranging from 0 (= no pain) to 10 (= worst imaginable pain). Karnofsky performance status [24] ranging from 0 (= lowest performance) to 100 (= highest performance).
still resulted in a fair number of false positives (11), although there were even more when using axial cortical involvement (25). Remarkably, five out of the seven femurs that developed a fracture during follow-up were treated with SF. Since we have no information about the initial fracture risk assessment, we can only speculate on why this is the case. These patients were either incorrectly assessed as low fracture risk at inclusion, or had insufficient clinical condition to undergo MF radiotherapy. However, the latter reason seems doubtful, as patients had to be in quite good clinical condition (KPS 24 ≥ 60) to be included in the study. Therefore, this shows that the fracture risks were probably not always assessed according to the axial cortical involvement as stated in the clinical guidelines, which underlines the need for a better, standardized and more reliable fracture risk assessment tool. Previously, we performed a comparable study, [21] in which we also generated FE models and compared the FE with clinical fracture risk assessments based on the axial cortical involvement measured on DRRs instead of conventional radiographs. Such DRRs are reconstructed radiographs based on the CT images with a rather coarse resolution (0.9375 × 0.9375 x 3 mm). Hence, the DRR image quality was rather poor in comparison to conventional radiographs. In that study, we included 47 femurs of which nine fractured (twelve of these femurs are part of the current study too) and found that the FE model had higher sensitivity, i. e. was more accurate in identifying patients with a high fracture risk, compared to clinical assessments, whereas specificity was lower for the FE models than for the clinical assessments on DRRs [21]. With respect to our previous study, the sensitivity and NPV of the FE model increased (89% for the previous study vs. 100% for the current study, and 97% vs. 100%, respectively), whereas specificity and PPV were slightly lower (79% vs. 74% and 50% vs. 39%, respectively). This can be explained by the fact that part of the CT scans used in the previous study were affected by air artefacts, in addition to possible interscanner differences, [26][27][28] and were, therefore, analyzed for each institute separately. In the current FE database, the CT scans with air artefacts were excluded and inter-scanner differences were corrected for, resulting in more accurate FE fracture predictions. Furthermore, the sensitivity and NPV of the clinical assessments in the previous study were lower (between 0% and 22% vs. 86% for the current study, and between 80% and 84% vs. 95%, respectively), whereas mainly specificity was much higher (between 84% and 97% vs. 42%) in contrast to that of the current study. PPV varied largely in the previous study between the different clinicians (0%-50%). The diagnostic accuracy values of the current study are closer to diagnostic accuracy of previous studies using the 30 mm threshold on conventional radiographs [6,12]. These deviating results of our previous study can be explained by the use of DRRs instead of radiographs for the clinical assessments on which the lesions were often not well visible, and, consequently, resulted in more femurs assessed as low risk. In the current study, conventional radiographs were used to measure the axial cortical involvement, and, therefore, these clinical assessments were more valid.
Another study by Goodheart et al. [30] compared fracture risk estimation by FE models with Mirels' scoring system. Mirels' scoring system, based on ratings of pain, lesion type, size and location [11], is, however, known to overestimate fracture risk, leading to large numbers of overtreatment [31], which can be reduced when applying the 30 mm threshold [7]. Goodheart et al. [30] showed that the FE model and Mirels' scoring system had similar sensitivity, whereas specificity was higher for the FE model compared to Mirels'. They concluded that FE models can improve fracture prediction over clinical assessments based on Mirels' scoring system [30]. Previously, Van der Linden et al. [7] and recently, Van der Wal et al. [12] showed that fracture risk assessment based on axial cortical involvement of the metastases was more accurate in comparison with Mirels' scoring system. In the current study, we Fig. 4. Femoral failure load corrected for bodyweight for the femurs that did or did not fracture during follow-up. The threshold at 7.5 x BW is based on a previous study [21] and is used to differentiate between high and low fracture risk. According to the FE model, 18 femurs had a high fracture risk, of which 7 actually fractured during follow-up. Green dots indicate correct clinical assessments and red dots indicate incorrect clinical assessments based on the 30 mm threshold of axial cortical involvement according to the clinical guidelines (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).

Table 2
Diagnostic accuracy values of the FE model and the clinical assessments based on 30 mm threshold for axial cortical involvement of the femoral metastases on radiographs.
Femurs that fractured during follow-up (n = 7) Femurs that did not fracture during follow-up (n = 43) Sensitivity Specificity PPV NPV Axial cortical involvement [6] > 30 mm (high risk) 6  showed that FE models can further improve these fracture risk assessments.
Other biomechanical methods, such as computed tomography rigidity analysis (CTRA) [32,33] and high resolution MRI-based FE models [34] also show promising results to be used to assess fracture risk. However, to our knowledge, the MRI-based FE models have not yet been tested in patients with actual metastatic lesions. Also, both CTRA as MRI-based FE models are not yet clinically being implemented.
This study had some limitations. Firstly, conventional radiographs were not available for all 66 patients (representing 76 femurs with bone metastases) in the FE database, resulting in exclusion of 21 patients (26 femurs). Secondly, patients with large lesions and an expected high fracture risk who were surgically treated were not included. Additionally, we only included patients already referred for palliative radiotherapy, so patients without symptoms were not included in these analyses. Consequently, we cannot conclude on whether the FE model would prevent any unnecessary surgeries in those patient groups. Thirdly, some patients died shortly after inclusion or became immobile and, consequently, did not develop femoral fractures during follow-up. They might have developed fractures if they had lived longer or had engaged in activities which would increase load bearing onto their legs.
FE models are probably better at assessing fracture risk in comparison to simple measurements on conventional radiographs due to the fact that they consider for example location and 3D geometry of the lesion, general bone quality, or geometry of the bone, whereas measuring the lesion in the cortex on a two dimensional radiograph does not take these factors into account. To this date, the FE model has only been used in research settings, but we are currently working towards clinical implementation. To enable implementation, we have developed a phantomless calibration method, [35] which facilitates widespread use of QCT and FE by avoiding the requirement of a calibration phantom during the CT scan session. For this phantomless calibration, CT densities of air, fat and muscle tissue are used, which results in FE failure loads comparable to those calculated using the conventional phantom calibration. [35] A limitation of the FE models is that they are currently not applicable for patients with predominantly osteoblastic lesions. This is due to the fact that the material model used is apparently not valid for the high CT density of such lesions and, hence, results in aberrant material properties [21]. None of the patients affected with predominantly osteoblastic lesions developed a fracture during follow-up. In the near future, if patients with femoral bone metastases visit their medical specialist, an FE fracture risk assessment can be ordered, just like ordering a lab-test. The treating physician can then discuss the results of the FE model with the patient, taking into account clinical factors such as the patient's clinical condition, life expectancy, and activity level.
In conclusion, this study showed that patient-specific FE models improve femoral fracture risk assessments in comparison to measuring axial cortical involvement on conventional radiographs as described in the current clinical guidelines on bone metastases. The FE models could prevent unnecessary surgical procedures and, therefore, improve quality of life of those patients. Therefore, clinical implementation of the FE models is supported.

Fig. 5.
The correlation between the femoral failure load corrected for bodyweight and the measured axial cortical involvement measured on conventional radiographs. The dotted lines depict the thresholds for differentiating between high and low fracture risk.