Vertebral CT attenuation outperforms standard clinical fracture risk prediction tools in detecting osteoporotic disease in lung cancer screening participants

Objectives: Compare accuracy of vertebral Hounsfield Unit (VHU) attenuation and FRAX and Garvan Fracture Risk Calculators in identifying low bone mineral density (BMD) and prevalent vertebral compression fractures (VF) in lung cancer screening (LCS) participants. Methods: Baseline CT scans from a single site of the International Lung Screen Trial were analysed. BMD was measured using VHU (of the most caudally imaged vertebra) and quantitative CT (QCT) (low BMD defined as <110 HU and <120 mg/cm3, respectively). Prevalent VF were classified semi-quantitatively. 10-year FRAX and Garvan fracture risks were calculated using dual energy X-ray absorptiometry (DXA) femoral neck T-score where available. Discrimination was assessed by area under receiver-operating characteristic curves (AUC). Results: 535 LCS participants were included; 41% had low VHU-BMD, 56% had low QCT-BMD and 10% had ≥1 VF with ≥25% vertebral height loss. VHU demonstrated 94% specificity and 70% sensitivity in identifying low QCT-BMD. VHU was superior to fracture risk tools in discriminating low QCT-BMD (AUC: VHU 0.94 vs FRAX 0.67, Garvan 0.64 [p < 0.05]). In 64 participants with recent DXA scans, VHU was superior to FRAXT-score and GarvanT-score in discriminating low QCT-BMD (AUC: VHU 0.99, FRAXT-score 0.71, GarvanT-score 0.71 [p < 0.05]). VHU was non-inferior to FRAXT-score and GarvanT-score in discriminating VF (AUC: VHU 0.65, FRAXT-score 0.53, GarvanT-score 0.61). Conclusions: VHU outperforms clinical risk calculators in detecting low BMD and discriminates prevalent VF equally well as risk calculators with T-scores, yet is significantly simpler to perform. Advances in knowledge: VHU measurement could aid osteoporosis assessment in high-risk smokers undergoing LCS.


INTRODUCTION
Lung cancer is the leading cause of cancer-related mortality worldwide.Lung cancer screening (LCS) with low-dose chest computed tomography demonstrated mortality benefits in large, international trials. 1,2LCS can potentially detect other diseases that share smoking as an aetiology, impacting patient outcomes, cost-effectiveness and generalisability.Osteoporosis represents a possible target disease: LCS participants may harbour undiagnosed osteoporosis due to the risks of smoking history and older age. 3,4Osteoporosis is often occult but can be detected by CT assessment of vertebral bone mineral density (BMD) and vertebral fractures (VF), prompting early intervention.
Standard clinical osteoporosis diagnosis involves fracture risk assessment and BMD testing.This is commonly achieved using multivariable risk calculators, such as FRAX or Garvan.][7][8][9][10][11] A presumptive diagnosis of osteoporosis can also be made in the presence of a non-traumatic spinal compression fracture. 5he WHO-preferred BMD test is DXA, due to wide availability and low radiation dose.Although QCT has superior sensitivity for interval BMD change and avoids confounding factors, like degenerative bone disease and aortic calcification, which may overestimate areal DXA-BMD, 12 it requires specialised analysis software.QCT is also limited by radiation dose, although this becomes irrelevant if pre-existing CT images are analysed. 12e to high disease burden and economic cost of osteoporosis, 13 early detection and fracture prevention in high-risk individuals is cost-effective. 14,15Smoking history alone does not qualify individuals for subsidised BMD testing in Australia, UK and USA.LCS participants aged below 70 years, although at risk of osteoporosis, are thus potentially excluded from BMD testing unless they have other significant risk factors or, in USA, UK and parts of Europe, if their estimated fracture risk meets a pre-defined clinical risk threshold. 5,6,10,11nding scalable alternatives to established BMD tests represents an important goal for patient care, especially if incorporation into population-wide initiatives such as LCS can add value.
Hounsfield Unit-attenuation of the vertebrae (VHU) is a proxy BMD measure.VHU has been studied in CT performed for numerous clinical indications [16][17][18] and demonstrates high sensitivity for DXA-defined osteoporosis. 19Low VHU is also associated with all-cause mortality in LCS participants. 3However, comparison against "standard-of-care" clinical fracture risk tools (with and without BMD T-scores) is necessary to support VHU as a screening tool in LCS.
To address this gap, we aimed to 1) compare VHU to FRAX and Garvan tools in discrimination of low QCT-BMD 2) compare VHU, QCT, FRAX and Garvan tools in discriminating prevalent VF and 3) assess VHU reproducibility in a LCS cohort.

Study design
This was a substudy of participants from a single site of the International Lung Cancer Screening Trial (ILST). 20][22] Osteoporosis-relevant and other medical and demographic variables were prospectively collected at baseline.Baseline screening CT scans were analysed at the per-vertebra and per-participant level.Ethical approval was obtained through Hospital Human Research Ethics Committee.Participants provided written informed consent.

CT acquisition
Participants underwent CT scans between 2017 and 2019, following a standard low-dose protocol (GE Healthcare Revolution TM 256-detector row CT scanner, 120kV, 40-50mA, 1:0 pitch, 1.8 mSv mean effective dose recorded by a dose-tracking system).Reconstructed images (standard soft-tissue kernel with ≤1 mm thickness and spacing) were stored in the hospital picture archiving and communication system (PACS).
One observer assessed BMD and VF from all scans following training and assessment by a consultant radiologist and blinded validation against a subset of scans from a prior LCS study. 23linded QCT and VHU assessments were performed at least 2 weeks apart.The observer repeated blinded VHU assessments on 100 randomly selected scans after 4-6 months.Randomisation was performed in Microsoft Excel, using RAND function to assign a value, order then select every fifth scan.
Clinical osteoporotic fracture risk assessment Participant 10-year risk of hip or major osteoporotic fracture (hip, spine, distal forearm, or proximal humerus) was estimated using FRAX (Australia-specific version) and Garvan calculators excluding DXA T-score, using self-reported clinical variables. 28,29For participants with DXA studies ≤ 2 years prior to baseline CT scan, FRAX and Garvan risk were also calculated including femoral neck T-score (FRAX T-score & Garvan T-score ).As per treatment initiation thresholds, 10 year FRAX or Garvan hip fracture risk >3% or major osteoporotic fracture risk >20%, was deemed "high-risk". 5,15Quality control of questionnaire data was performed by a physician.Missing or equivocal responses were clarified directly with participants by telephone.Reported fracture history was confirmed (in preferential order) against historical imaging, medical reports or detailed participant history.Fractures occurring from standing height or below, or due to circumstances that would not cause fracture in a healthy adult, were considered "low-trauma". 10,28Risk factors unknown to participants were presumed absent.Participants with incomplete clinical data following quality control were excluded.

Analysis
Statistical analysis was performed using RStudio (V1.3.1073,Boston, MA, USA) and Stata software (V17, StataCorp LLC, College Station, TX, USA).Means were compared using student's t-test.Proportions were compared using chi-squared test.p-values < 0.05 distinguished statistical significance.Agreement between BMD methods was assessed using regressionbased Bland-Altman analysis. 21Correlation was assessed using Pearson's coefficient (R).Discrimination of methods (VHU, mean QCT-BMD, FRAX and Garvan 10 year risk of major osteoporotic fracture with and without DXA T-score) was assessed using area under receiver-operating characteristic curves (AUC).Optimal VHU threshold for discriminating low QCT-BMD was determined using Youden index.Associations between key variables and prevalent VF were modelled using logistic regression.
Intra-and inter-observer reliability of VHU was measured using intra-class correlation coefficient (ICC) and intra-observer reliability also using root-mean-square-error (RMSE).Interobserver reliability was measured from blinded VHU measurements performed in a retrospective training data sample (n = 29) by clinicians of differing experience (resident physician, pulmonologist, radiologist and a radiology trainee).

Sample size
A projected sample of ≥500 participants aimed to maximise incident fracture yield over the prospective 5-year trial period (estimated from previous Australian population-based cohort studies) in order to accurately assess these outcomes at trial completion. 30

Participants
Study flow is shown in Figure 2. Thirty-eight of 595 ILST participants did not provide informed consent; six had missing clinical risk information; 16 had technical issues with CT analysis (seven scans could not be retrieved from PACS for BMD analysis and nine scans encountered a calibration module error during QCT analysis). .246 (46%) participants were active smokers (Table 1).Excluded participants were younger (median age 62 [IQR = 12] years), predominantly male (73%, p = 0.042) and somewhat more likely active smokers (58%; p = 0.093).
In multivariable analysis, male gender was associated with a 92% increased odds of a prevalent VF (OR:1.92,[95%CI: 1.04-3.67])(Supplementary Table 2).Age, BMI and active smoking did not reach statistical significance (Supplementary Table 2).Results were similar when all VF grades were included (Supplementary Table 3 and 4).
In participants with previous DXA, VHU was non-inferior to Correlation and agreement between methods VHU correlated highly with QCT-BMD (R = 0.90, RMSE 14.1) (Figure 5).In Bland-Altman analysis, mean difference between VHU and QCT values for an individual was −0.73.Regression between means and differences showed little difference on average throughout the measurement range (Supplementary Figure 1).Smaller values rarely differed by more than 20 units, and larger values rarely differed by more than 40 units.

DISCUSSION
In line with prior studies, our high-risk LCS cohort harboured high prevalence of QCT-defined osteoporosis (7%) and osteopenia (49%), occult in 51% of participants.Using spinal DXA, Br J Radiol;96:20220992 BJR osteoporosis detection in lung cancer screening the trial reported a higher osteoporosis prevalence of 27% but similar osteopenia prevalence of 44% (n = 302; 98 men). 25Another multi-centre study of LCS participants reported osteoporosis and osteopenia prevalence of 15 and 42%, respectively, using L1-L2 QCT (n = 34,153; 59% men). 31Our lower reported osteoporosis prevalence may arise from using higher vertebrae (T11 and T12) in QCT analysis. 32rtebral fractures (VF) appeared highly prevalent in our cohort (35%), and in the NELSON cohort (39%). 3We note male gender was associated with increased odds (1.92) of prevalent VF, despite females generally experiencing a higher rate of BMD decline post-menopause.We suggest that osteoporosis risk assessment is perhaps more readily undertaken in females, leading to increased use of bone protection medication, however, we do not have data to confirm or refute this.
VHU performed well against reference-standard QCT.Previous studies similarly report high discrimination of VHU against QCT.A recent study of non-contrast spinal LDCT (n = 180) reported an AUC of 0.97. 17Another LCS study (n = 457) reported AUC of 0.96 on comparing thoracic and upper lumbar vertebral attenuation to QCT. 4 AUC values appear comparatively lower when comparing VHU to DXA (0.74 to 0.79 19,25 ), likely due to confounding factors affecting areal DXA assessment. 12A validated method like asynchronous QCT may seem favourable, however the practicality of image transfer, translation and analysis, is more time-consuming when compared to a simple 30 s VHU measurement.While it is not surprising that VHU performs well against QCT, as both are based on Hounsfield Unit measurement, VHU is not approved for use in bone density screening; however, our data support the notion that VHU presents a scalable alternative in a screening setting.VHU threshold of 110HU was highly specific but only moderately sensitive in classifying low QCT-BMD.This threshold, previously reported to be 91% specific in detecting DXA-defined osteoporosis using contrast-enhanced and non-contrast abdominal CT scans, was suggested for lower risk populations to minimise false-positive results. 24However, optimal thresholds vary between CT methods and cohorts, limiting generalisability. 16In a high-risk LCS cohort, more sensitive VHU cut-offs of 126HU (osteoporosis and osteopenia) and 95HU (osteoporosis) may be more appropriate to limit false-negative results.
VHU closely estimates BMD, the risk factor most strongly associated with fracture. 28In the absence of DXA T-score, VHU outperformed FRAX and Garvan in discriminating participants with low QCT-BMD and prevalent VF.Whilst inclusion of T-score reportedly improves the predictive ability of clinical tools for incident fractures, 9 we did not find significant improvement in prediction of prevalent VF, which itself is predictive of incident fracture. 28VHU was similar to both FRAX T-score and Garvan T-score in discriminating participants with VF (AUC 0.53 and 0.61, respectively).This finding is important; risk estimation using FRAX and Garvan algorithms requires additional clinical information (subject to recall bias) and DXA imaging or QCT analysis, while, in comparison, measuring VHU is rapid, simple and reproducible, using non-specialised software on existing low-dose CT images.We found high intra-observer ICC 0.99 and inter-observer ICC 0.93, obtained from a sample of clinicians with varying levels of radiological experience, supporting method accessibility.This is consistent with previous studies which reported excellent inter-and intra-observer reliability ICC 0.96 and 0.94-0.98,respectively. 18Our results support further exploration of VHU in incident fracture prediction.To address this, our cohort will be followed up for health and fracture outcomes over the 5-year ILST trial period.
Recent large studies in USA and Israel compared automated CT-derived bone metrics from chest and abdominal CT scans against FRAX scores algorithmically calculated from electronic health records (EHR) from general populations and colon cancer screenees.They found automated CT-derived bone metrics were comparable with FRAX scores in predicting incident fractures. 33,34In contrast, our study employed clinician-conducted VHU measurements and clinician-validated FRAX calculation, replicating standard clinical workflow and reducing bias attributed to missing data.Our findings are relevant to clinical practice in the absence of an EHR, applicable in any clinical setting.Aligning with the findings of these studies, our results add to the generalisability of using bone density metrics in lowdose chest CT scans for high-risk LCS participants.
This study has some limitations.Unlike dedicated spine QCT scans, lumbar vertebrae are not always imaged on LCS CT scans, and thus VHU was not consistently measured from the same level.The decreasing intervertebral BMD gradient from T11 to L1 may cause under-recognition of osteoporosis when measuring BMD in thoracic rather than lumbar vertebrae. 4,32lthough we found minimal difference between T12 and L1 attenuation in low QCT-BMD discrimination (AUC 0.92 and 0.96 respectively), consistent measurement from a higher vertebral level could considered in further studies with a view to establishing thresholds for low-BMD classification.Similarly, using the average QCT-BMD value across T11-L1 may underestimate osteoporosis compared to the standard L1-L4 average. 32VF occurring outside the imaged field were also potentially omitted, contributing to bias.However as osteoporotic VF most commonly occur in the thoracic spine or thoracolumbar junction, most VF are likely to be captured in this study. 35though QCT was used as reference-standard, DXA is the preferred standard test.While comparison between QCT and DXA was not our primary focus, we reported low sensitivity for QCT (35%) in identifying participants with prior DXA-defined osteoporosis, and low specificity (41%) in identifying prior DXAdefined osteoporosis or osteopenia within a small sample.As QCT is more sensitive to low BMD than DXA, "false-positives" may represent osteoporotic individuals undetected by DXA.Conversely, low specificity of QCT for detecting low DXA-BMD could be explained by measurement of higher vertebrae for QCT compared to DXA.
Our targeted study population of LCS participants limits wider applicability of results to non-screening cohorts due to reported variability in diagnostic accuracy of VHU. 16The narrow age window of LCS participants also limits applicability to younger smokers.However, LCS is rapidly becoming standard practice in many countries, with 14.5 million screen-eligible individuals estimated in USA alone. 36With well-defined participant selection and LDCT protocols internationally, our results will have widespread generalisability to large LCS cohorts.
In summary, we conclude that VHU can more simply and accurately discriminate LCS participants with occult osteoporotic disease, compared to standard-of-care fracture risk tools, regardless of prior T-score estimates.Among these individuals, who may not routinely access formal bone density testing, opportunistic VHU assessment and routine reporting on screening scans will add value to LCS, facilitating early disease detection and prompt intervention.

Figure 1 .
Figure 1.Example of region of interest (ROI) placement within the vertebral body for measurement of VHU (V1 shown in image).

Figure 2 .
Figure 2. STARD Diagram describing flow of participants through study.ILST = International Lung Screening Trial.QCT = Quantitative CT.PACS = picture archiving and communication system.

Figure 3 .
Figure 3. Receiver-operating characteristic curves showing vertebra Hounsfield Unit attenuation (VHU) compared to clinical risk scores (FRAX and Garvan) in discriminating low quantitative CT-defined bone mineral density.

Table 2 .
AUC for variables in classifying low BMD and prevalent vertebral fracture in the entire study cohort (n = 535) and in a subset of individuals with DXA T-scores (n = 64) a FRAX or Garvan 10-year risk of any major osteoporotic fracture, calculated without T-score unless specified.