Machine‐Learning Assessed Abdominal Aortic Calcification is Associated with Long‐Term Fall and Fracture Risk in Community‐Dwelling Older Australian Women

Abdominal aortic calcification (AAC), a recognized measure of advanced vascular disease, is associated with higher cardiovascular risk and poorer long‐term prognosis. AAC can be assessed on dual‐energy X‐ray absorptiometry (DXA)‐derived lateral spine images used for vertebral fracture assessment at the time of bone density screening using a validated 24‐point scoring method (AAC‐24). Previous studies have identified robust associations between AAC‐24 score, incident falls, and fractures. However, a major limitation of manual AAC assessment is that it requires a trained expert. Hence, we have developed an automated machine‐learning algorithm for assessing AAC‐24 scores (ML‐AAC24). In this prospective study, we evaluated the association between ML‐AAC24 and long‐term incident falls and fractures in 1023 community‐dwelling older women (mean age, 75 ± 3 years) from the Perth Longitudinal Study of Ageing Women. Over 10 years of follow‐up, 253 (24.7%) women experienced a clinical fracture identified via self‐report every 4–6 months and verified by X‐ray, and 169 (16.5%) women had a fracture hospitalization identified from linked hospital discharge data. Over 14.5 years, 393 (38.4%) women experienced an injurious fall requiring hospitalization identified from linked hospital discharge data. After adjusting for baseline fracture risk, women with moderate to extensive AAC (ML‐AAC24 ≥ 2) had a greater risk of clinical fractures (hazard ratio [HR] 1.42; 95% confidence interval [CI], 1.10–1.85) and fall‐related hospitalization (HR 1.35; 95% CI, 1.09–1.66), compared to those with low AAC (ML‐AAC24 ≤ 1). Similar to manually assessed AAC‐24, ML‐AAC24 was not associated with fracture hospitalizations. The relative hazard estimates obtained using machine learning were similar to those using manually assessed AAC‐24 scores. In conclusion, this novel automated method for assessing AAC, that can be easily and seamlessly captured at the time of bone density testing, has robust associations with long‐term incident clinical fractures and injurious falls. However, the performance of the ML‐AAC24 algorithm needs to be verified in independent cohorts. © 2023 The Authors. Journal of Bone and Mineral Research published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research (ASBMR).


Introduction
D ual-energy X-ray absorptiometry (DXA) machines measure bone mineral density (BMD) to assess osteoporosis and risk of fracture, with very low radiation.DXA machines are also increasingly being used to perform vertebral fracture assessment (VFA) to visually identify asymptomatic, clinically unrecognized vertebral fractures at the time of bone density screening. [1]This involves obtaining lateral images of the thoracolumbar spine.These lateral spine images can also be used to identify abdominal aortic calcification (AAC), a marker of advanced atherosclerotic vascular disease, most commonly assessed using a semiquantitative 24-point scale (AAC-24). [2,3]][10][11] A major limitation for the clinical assessment of AAC is that it needs to be manually scored by a trained expert, a time-consuming task that limits the possibility for AAC assessment to be scaled and incorporated into routine clinical assessment, especially as part of osteoporosis screening.To address this limitation, machine learning (ML) approaches have been investigated to automate the identification of AAC and AAC-24 scoring from lateral spine images. [12,13]This builds on work to automate measurement of aortic calcification from abdominal computed tomography (CT) scans. [14]We recently developed and tested a ML algorithm for the automated assessment of AAC (ML-AAC24) from DXA-derived lateral spine images acquired from various major manufacturers (and models) currently used clinically.These scores were validated against future major adverse cardiovascular events, cardiovascular, and all-cause mortality from Hologic and GE machines. [13]iven that extensive AAC is associated with poorer musculoskeletal outcomes (e.g., falls [7] and fracture [8] ), determining whether these relationships remain comparable after automation (i.e., ML-AAC24) is a critical step to enhance the clinical utility of lateral spine images at the time of BMD measurement.Therefore, this study examined whether ML-AAC24 is related to longterm falls and fracture risk in community-dwelling older women, representing a high-risk population.

Methods
We have previously reported on the relationship between manually-assessed AAC with hospitalized fall [7] and fracture outcomes [8] in the study population as described in the following section.Development of the ML approach used in this study (ML-AAC24) has also been detailed. [13]The following section provides a brief overview.

Study population
The present study drew upon participants from the Perth Longitudinal Study of Ageing Women (PLSAW), which enrolled 1500 women aged 70 years or older in a 5-year, double-blind, randomized controlled trial (RCT) to determine the effect of calcium supplements on preventing osteoporotic fracture. [15]Following the trial, women were observed for a further 10 years.Women were excluded from the original RCT that forms the basis for our observational study analysis if they were taking any boneactive medication. [15]Lateral spine DXA images were obtained in 1024 women at baseline and/or 1 year of the RCT (1998/99).One participant's scan could not be read by our algorithm, leaving 1023 women for the current analysis.All 1023 women were included in fracture analyses and unadjusted and minimallyadjusted falls analysis, whereas 998 women were included in multivariable-adjusted falls analysis due to missing covariate data.Ethics approval for the 5-year RCT and a subsequent 10-year follow up was granted by the Human Research Ethics Committee at the University of Western Australia and the use of linked data was approved by Western Australian Department of Health (project number #2009/24).Written informed consent, including explicit authorization for future access to Western Australian Department of Health data, was obtained from all participants.

Baseline risk assessment
Body mass index (BMI, kg/m 2 ) was calculated from body weight measured using digital scales and height assessed using a wallmounted stadiometer.Smoking status was classified as nonsmoker or ever smoked, which included individuals who had smoked at least one cigarette per day for 3 months or more, as well as current smokers.Participants' physical activity levels were assessed by asking them to report any sports, recreational activities, and regular physical activities they had participated in during the 3 months before their baseline visit.The energy expenditure in kJ/day for each participant was calculated, taking into account their body weight, using published energy costs for various activities. [16]The Socioeconomic Indexes for Areas developed by the Australian Bureau of Statistics was used to determine socioeconomic status by ranking residential postcodes based on their relative socioeconomic advantage/disadvantage.Participants were subsequently assigned to one of six groups, ranging from the most highly disadvantaged (top 10%) to the least disadvantaged (top 10%). [17]Medication use, including antihypertensives, statins, and oral hypoglycemic medications were confirmed by their primary care physician and classified according to the International Classification of Primary Care-Plus system (T89001-T90009).Prevalent diabetes at baseline was determined by insulin or oral hypoglycemic medication use.Prevalent atherosclerotic vascular disease (ASVD) was determined using primary discharge diagnoses from hospital data from 1980 to 1998 and retrieved from the Western Australian Data Linkage System, as described. [4]Atherosclerotic events were defined using primary diagnosis codes from the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). [4]Previous falls were determined by asking participants if they had fallen in the 3 months prior to their baseline clinical visit.

Assessment of ACC
AAC was assessed from lateral spine images obtained at baseline or at 1-year (1998-1999).All lateral spine images were obtained from a Hologic 4500A bone densitometry machine, which captured digitally enhanced single-energy-derived lateral images of the thoracolumbar region.The ML-AAC24 scores used in the current study were obtained from our previous work focused on automated AAC-24 assessment for CVD outcomes, and the ML algorithm was only trained on expertly assessed AAC-24 scores. [13]A brief explanation of the training process is as follows [13] : a regression network was employed based on image features.This was based on the Kauppila AAC 24-point semiquantitative scoring method (AAC-24), which is the most widely used method to manually score AAC. [13,18]Briefly, the framework consists of three modules: image preprocessing, feature extraction, and regression.The pretrained EfficientNet-B3 was used for feature extraction due to its efficiency and high performance.The last fully connected convolutional layer of the EfficientNet-B3 model was replaced with a custom-designed regression network.Regression network was used to predict total AAC score on a continuous scale, ranging from 0 to 24, and was comprised of two dense layers with batch normalization and rectified linear unit activation function and a fully connected final layer with linear activation.This model was trained on the single-energy images from the Hologic 4500A machine. [13]The ML-AAC24 scores were categorized into three groups, based on previous work when assessed manually [4,8] : low (ML-AAC24 ≤ 1), moderate (ML-AAC24 ≥ 2 to <6), and extensive (ML-AAC24 ≥ 6).A 10-fold stratified cross-validation was performed to account for the imbalance in the different AAC categories in the training dataset.The average performance of the model was reported after 10 repetitions of training on nine folds, and testing on the remaining one as described. [13]For the current study, ML-AAC24 was categorized as low (ML-AAC24 ≤ 1) or moderate to extensive (ML-AAC24 ≥ 2), as in our previous work considering manually assessed AAC and fractures [8] in this cohort.As reported, there was substantial accuracy and agreement with the automatic ML-AAC24 scores and the manually assessed AAC-24 scores. [13]Specifically, Sharif et al. [13] reports that the Cohen's weighted kappa for manual AAC-24 versus ML-AAC24 assessed from lateral spine images obtained via the Hologic 4500A machine was 0.51, with an intraclass correlation coefficient of 0.76 (95% confidence interval [CI], 0.74-0.78).Performance of ML-AAC24 to classify individuals into the low versus moderate to extensive ML-AAC24 groups were also described [13] ; accuracy 81.0% (79.8-82.0);sensitivity 82.9% (81.5-84.3);specificity 78.5% (76.7 to 80.2); positive predictive value 82.9% (81.7-84.0);and negative predictive value 78.5% (77.0-79.9).Of note, the agreement was substantial across different DXA makes and models (Hologic 4500A and Horizon, GE Lunar Prodigy and iDXA), suggesting that the algorithm trained on Hologic 4500A images is generalizable to other commonly used bone density machines. [13]evalent and incident clinical fractures Prevalent self-reported fractures were determined at baseline by asking the age and location of fracture sustained.Only fractures satisfying specific criteria were considered prevalent, including fractures that occurred after the age of 50 years and prior to participants baseline clinical visit due to minimal trauma (defined as falling from a height of 1 m or less), while also excluding fractures involving the face, skull, fingers, or toes.Prevalent vertebral fractures were determined from lateral single-energy images of the thoracolumbar spine, and scored using the Genant semiquantitative method.[19] A modification was that grade 1 fractures were considered fractures only if there was clear endplate depression or cortical discontinuity.[20] Any incident clinical fracture, including vertebral fractures, coming to clinical attention over 10 years were recorded in an adverse events diary, collected every 4 months during the first 5 years and every 6 months during the second 5 years.The diagnosis of all incident clinical vertebral and nonvertebral fractures was confirmed from radiographic records.

Fall and fracture-related hospitalization
Fall and fracture-related hospitalizations were collected from the Western Australia Hospital Morbidity Data Collection (HMDC) using the Western Australian Data Linkage System, which provides a complete validated record of every participant's primary diagnosis at hospital discharge using coded data from all hospitals in Western Australia.The HMDC records of all women were obtained from their baseline clinical visit in 1998 and over the next 14.5 years for hospitalized falls and 10 years for fracture-related hospitalization to enable comparisons to previous work in this cohort. [7,8]Diagnosis codes were defined using the ICD-9-CM codes 1998 to 1999 [21] mapped to the ICD-10 Australian Modification (ICD-10-AM) for 1999 to 2013. [22]Hip and fracture-related hospitalizations were identified using the following ICD-10 codes: S02, S12, S22, S32, S42, S52, S62, S72, S82, S92, M80, T02, T08, T10, T12, and T14.2.Fractures of the face (S02.2-S02.6),fingers (S62.5-S62.7),and toes (S92.4-S92.5),as well as fractures caused by motor vehicle injuries (External Cause of Injury codes V00-V99) were excluded.Major osteoporotic fractures included those of the hip, spine, humerus, and wrist, as defined by the Fracture Risk Assessment Tool (FRAX).Fall-related hospitalizations were identified based on the international classification of external causes of injury codes and ICD-coded discharge data for all public and private inpatient admissions.Falls from standing height or less that were not caused by an external force, were identified using the following ICD-10 codes: W01, W05, W06, WO7, W08, W10, W18, and W19.The HMDS captures coded diagnosis data pertaining to all public and private inpatient contacts in Western Australia, it allows ascertainment of both fall and fracture-related hospitalizations independently of a patient report with the associated problems such as loss to follow-up.

Bone measurements
Quantitative ultrasound (QUS) of the left calcaneal was obtained in duplicate using a Lunar Achilles Ultrasound machine (Lunar Corp., Madison, WI, USA) at baseline (1998) in 989 women in this study.The average measurement of broadband ultrasound attenuation, speed of sound, and stiffness index were recorded as detailed. [23]Total hip and femoral neck BMD of subjects was measured by DXA (Hologic Acclaim 4500A fan-beam densitometer; Hologic Corp, Waltham, MA, USA).The coefficient of variation at the total hip was 1.2% in our laboratory. [8]timated fracture risk Fracture risk prediction using the FRAX website (https://frax.shef.ac.uk/FRAX/) was conducted using FRAX Australia.The estimated 10-year major osteoporotic fracture risk was calculated using baseline data, with and without BMD.The dataset comprised the following clinical parameters: age, sex, weight (in kg), height (cm), fracture history (including previous fracture and parental history of hip fracture), current smoking status, glucocorticoid use, history of rheumatoid arthritis or secondary osteoporosis, and alcohol consumption (three or more units per day).Data were inputted with and without femoral neck BMD (g/cm 2 ).

Muscle strength and function assessment
Grip strength was assessed using an isometric Jamar hand dynamometer, with the peak value recorded from three attempts using the participants' dominant hand.The timed-up and-go test (TUG), involves timing the participant's ability to rise from a chair, walk 3 m, turn, and return to sit on the chair.Participants were allowed to practice the test once before being timed.

Statistical analysis
Differences in baseline characteristics between the low and moderate to extensive ML-AAC24 categories were compared using independent t tests, the Mann-Whitney U test, or chisquare test where appropriate.Kaplan-Meier survival analysis was used to determine the univariate association of ML-AAC24 categories with 14.5-year fall and 10-year fracture-related hospitalization, as well as incident self-reported clinical fractures.
Cox proportional hazards regression models were used to investigate the association between ML-AAC24 categories with fall and fracture outcomes.To more closely align with our previous work, [7,8] the covariates included in the models differed for the falls and fracture analysis.For fractures, the model was adjusted for treatment (placebo/calcium) and FRAX with and without femoral neck BMD.

Additional analyses
To further explore the relationship between ML-AAC24 and falls and fracture outcomes, we analyzed fracture outcomes using a multivariable-adjusted model based on the covariates included for falls.However, the multivariable model differed slightly to that used for falls due to the inclusion of FRAX in the model, which includes age, BMI, smoking status, and prevalent fractures.Therefore, this revised multivariable fracture model included treatment code, FRAX with femoral neck BMD, prevalent ASVD, prevalent diabetes mellitus, statin use, blood pressure-lowering medication use, socioeconomic status, physical activity, and prevalent falls.We also analyzed the relationship of AAC with fall and fracture outcomes using two alternative categorizations of ML-AAC24 used previously. [4,7]Specifically, we compared AAC presence (ML-AAC24 ≥ 1) and absence (ML-AAC24 = 0), as well as across three groups of AAC extent (low [ML-AAC24 ≤ 1], moderate [ML-AAC24 ≥ 2 to <6], and extensive [ML-AAC24 ≥ 6]).

Results
A total of 1023 women were included in the current study, with a mean AE standard deviation (SD) age of 75.0 AE 2.6 years at baseline.Baseline characteristics of the total sample, as well as for those with low and moderate to extensive ML-AAC24, are presented in Table 1.Compared with women with low ML-AAC24, women with moderate to extensive ML-AAC24 were slightly older, had lower BMI, were more commonly current or former smokers, and were more likely to have prevalent ASVD and be taking lipid-lowering medication.Compared with women with low ML-AAC24, women with moderate to extensive ML-AAC24 had lower hip BMD, lower ultrasound attenuation and stiffness index, and higher estimated major osteoporotic fracture risk.However, the proportion of women with any prevalent osteoporotic fracture since the age of 50 years, as well as the number, grade, or location of prevalent vertebral fractures did not differ (Table 1).Grip strength, TUG, and the proportion of women with a self-reported prevalent fall in the 3 months prior to baseline did not differ between women with low or moderate to extensive ML-AAC24 (Table 1).

ML-AAC24 and 10-year clinical fractures
Over 10 years of follow-up, 24.7% (253/1023) of women experienced a clinical fracture (8124 person-years; mean AE SD follow-up 7.9 AE 3.0 years), and 16.5% (169/1023) of women had a fracture hospitalization (9111 person-years; mean AE SD follow-up 8.9 AE 2.2 years).Compared with women with low ML-AAC24, women with moderate to extensive ML-AAC24 had an increased risk of any clinical fracture, including vertebral and nonvertebral fractures (Table 2, Fig. 1).Hospitalization risk due to any fracture or hip fracture did not differ between women with low or moderate to extensive ML-AAC24.These results were consistent across all models of adjustment.ML-AAC24 and 14.5-year fall-related hospitalizations Over 14.5 years of follow-up (11,548 person-years; mean AE SD follow-up 11.3 AE 3.9 years), 38.4% (393/1023) of women experienced a fall-related hospitalization.Compared with women with low ML-AAC24, women with moderate to extensive ML-AAC24 had between 35% and 37% increased hazard for a fall-related hospitalization in the multivariable-adjusted model (Table 2, Fig. 1).

ML-AAC24 compared with manually assessed AAC24
Hazard ratios (HRs) for ML-AAC24 and 10-year incident fractures were consistent when AAC groupings based on manually assessed AAC24 was used in the current fracture analysis (Fig. 2).Similarly, HRs for ML-AAC24 and 14.5-year fall-related hospitalizations were consistent when manually-assessed AAC24 was used in the multivariable-adjusted model (Fig. 2).

Additional analyses
When additional covariates were included in the analysis, to align with the primary multivariable-adjusted analysis for falls, HRs for moderate to extensive compared with low ML-AAC24 for all fracture outcomes were nearly identical (Table S1).When AAC was categorized as presence and absence (Table S2), or into three categories of extent (Table S3), results were largely consistent with that of the primary analysis.However, for presence versus absence of ML-AAC24, the relationship with clinical vertebral fractures was not significant, despite the HRs being similar to the primary analysis.Across three categories of AAC extent, extensive but not moderate AAC was associated with an increased risk of clinical vertebral fracture, compared with low AAC.Despite, extensive AAC being associated with increased risk of clinical nonvertebral fracture and hip fracture hospitalization, neither association persisted once adjusted for FRAX with femoral neck BMD.Note: Bolded values represent significant differences ( p < 0.05).HRs (95% CI) analyzed using Cox-proportional hazard models.AAC categorized as low (ML-AAC24 ≤ 1) and moderate to extensive (ML-AAC24 ≥ 2).n = 1023 for all fracture models, and for unadjusted and minimally adjusted falls models, n = 998 for multivariable adjusted falls model.
a Clinical vertebral and nonvertebral fracture numbers do not equal total clinical fracture numbers as women may have suffered more than one fracture type.

Discussion
This study reports that older women with moderate to extensive AAC, assessed using an automated ML algorithm, had an increased risk of incident fall-related hospitalizations and clinical fractures, compared with women with low ML-AAC24.Of note, ML-AAC24 was not associated with incident fracture hospitalizations.These findings remained consistent after adjusting for various covariates.Women with moderate to extensive ML-AAC24 also had lower BMD compared to women with low ML-AAC24.However, neither TUG performance nor prevalent fractures or falls differed between ML-AAC24 groups.These findings support the use of ML-AAC24 to routinely assess AAC on lateral spine images in clinical practice, to provide important prognostic information regarding clinical musculoskeletal outcomes.
The potential mechanisms underlying the observed relationships between AAC and falls and fracture risk have been discussed. [7,8,11]Briefly, this may relate to impaired blood flow to the skeleton distal to the abdominal aorta negatively influencing BMD, which may contribute to increasing fracture risk.Vascular disease may also contribute to increased falls propensity, and subsequent fracture, via various mechanisms.For example, vascular disease may lead to increased prescription of certain cardiovascular medications that may increase falls risk due to side effects such as orthostatic hypotension, confusion, or dehydration. [24]Falls may also result from transient ischemic events or syncope, or cardiac abnormalities such as arrhythmias and reduced cardiac output, which may be more common in individuals with vascular disease. [24]There are also shared genetic and environmental risk factors for muscle, bone, and vascular health. [7,8,11]However, the potential mechanisms require further investigation.Although moderate to extensive ML-AAC24 was associated with incident self-reported clinical fractures, it was not associated with incident fracture-related hospitalizations.This was similar to our previous analysis [8] where manually assessed AAC was not associated with fracture hospitalizations after accounting for BMD.In our additional analyses, extensive but not moderate ML-AAC24 was associated with an increased risk of hip fracture hospitalization; however, this was attenuated when FRAX with BMD was included in the model.These results may be in part due to reduced statistical power because there were fewer fracture hospitalizations, and particularly hip fracture hospitalizations, compared with clinical fractures.The HRs for ML-AAC24 and incident fracture and falls risk in this study were largely consistent with those observed using manually assessed  AAC in the analysis.Furthermore, the observed HRs for moderate to extensive ML-AAC24 and incident fracture outcomes are consistent with those reported in a recent meta-analysis of 86 studies and 61,553 participants. [11]In 28 studies (n = 33,748 participants) reporting fracture risk with any/advanced AAC compared with no/low AAC, the higher AAC group had a pooled relative risk of 1.51 (95% CI, 1.25-1.82)for any incident fracture. [11]In support of the current results showing poorer bone health with moderate to extensive ML-AAC24, a meta-analysis of 30 studies showed that those with any/advanced AAC had lower BMD at the total hip, femoral neck, and lumbar spine. [11]Overall, falls and fracture risk estimates using ML-AAC24 appear consistent with manually assessed AAC, which aligns with the good levels of agreement reported between the ML-AAC24 and AAC scored by trained imaging specialists. [13]e and others have shown that the presence and extent of manually assessed AAC is associated with impaired musculoskeletal health, [7,8,11] increased risk for all-cause mortality and cardiac events, [6] as well as late-life dementia hospitalizations and deaths. [25]Recently, we reported that women with higher ML-AAC24 had increased risk of all-cause mortality and CVD-related mortality. [13]The present study builds on this evidence, demonstrating that higher ML-AAC24 is also associated with increased incident falls and clinical fracture risk.Statin use has been proposed to increase vascular calcification, [26,27] and there were a higher proportion of statin users in the moderate to extensive ML-AAC24 group of the current study.However, when statin use was included as a covariate in multivariable adjusted falls and fracture models, the point estimates were nearly identical for all outcomes, suggesting the observed associations were independent of statin use.Importantly, the predictive value of ML-AAC24 does not appear to decrease with increasing followup duration, suggesting it is a valuable measure of long-term fall and fracture risk.However, the predictive value appears to be lower for imminent falls and fractures.In both the previous [13] analysis and present analyses on falls and fractures, the observed associations with ML-AAC24 were consistent with that of manually assessed AAC.Together, these results highlight that ML-AAC24 provides important prognostic information about cardiovascular and musculoskeletal health, which is similar to that obtained when AAC is assessed manually by an experienced imaging specialist.
Clinical guidelines increasingly recommend performing lateral spine images at the time of DXA testing to identify asymptomatic vertebral fractures. [1]Further, even if a lateral spine image is obtained, AAC is not routinely assessed in clinical practice, largely due to practical considerations such as the time taken for manual scoring and the lack of expert assessors globally who are appropriately trained to score AAC.Therefore, methods to automate AAC assessment are needed to address these key barriers and support more widespread and routine use of AAC clinically at the time of osteoporosis screening.For example, although a lateral spine DXA scan takes 2-5 min to reposition and scan (depending on make and model), it takes even the most experienced reader approximately 5-6 min to obtain an AAC score from the image.In comparison, our recently developed ML algorithm takes less than a minute to predict the AAC score for hundreds of images.The ultimate goal of this work is for an automated AAC assessment to be incorporated into the software of common DXA machines.Although Hologic DXA scanners include software that helps to more easily visualize AAC from lateral spine images, it does not automatically detect or quantify AAC, so still requires considerable time and expertise to obtain an AAC-24 score.An automated algorithm would allow AAC to be instantaneously assessed and reported on in clinical practice for any lateral spine image captured.Importantly, our algorithm was shown to have good levels of agreement with AAC that was manually assessed by trained experts. [13]This is consistent with the reported agreement between automatic and manual scoring of aortic calcification from abdominal CT scans. [14]The current study builds on this work demonstrating that ML-AAC24 provides prognostic information about falls and fracture risk in older women.Given bone density scans involve a very low radiation dose and are routinely used for osteoporosis screening, performed most commonly in older women, the ability of this ML approach to provide AAC readings at the time of bone density testing is of clear clinical value, especially in the context of opportunistic health screening for other health outcomes including all-cause mortality and cardiovascular events, [6] as well as late-life dementia hospitalizations and deaths. [25]trengths of the present study include the large number of older Australian women included, who are representative of the population typically undergoing bone densitometry for osteoporosis screening, the long-term prospective followup (10 and 14.5 years for fracture and falls outcomes, respectively), the capture of clinical fracture events through verified self-report (by General Practitioner), as well as the capture of fall-and fracture-related hospitalizations from the Western Australian Data Linkage System independent of self-report.There are also several limitations that should be considered.First, as acknowledged in the original study, [13] in the absence of an unseen hold-out separate dataset, the machine learning algorithm to assess AAC-24 scores was trained on the same set of images that were used in this study.However, to mitigate this limitation, a 10-fold cross-validation approach was employed to ensure that the algorithm was evaluated on a data partition (fold) that it had not seen during training, as described. [13]This process was repeated 10 times to obtain the test AAC24 scores across the entire dataset, ensuring that the algorithm's performance was assessed on unseen data with no prior knowledge of the test images.An important next step of this work will be to verify the performance of the ML-AAC24 algorithm in independent cohorts.Additionally, it is important to note that the algorithm was trained only on AAC-24 scores and has not been optimized/trained to predict long-term fall-or fracture-related risks/hospitalizations.Second, this was an observational study so causality cannot be established.Third, we cannot exclude the possibility of bias being introduced, particularly as the lateral spine scans were only completed in approximately 70% of the larger cohort.Finally, the cohort consisted predominantly of communitydwelling older white women with a mean age of 75 years; therefore, these findings may not be generalizable to younger women, men, older women of other ethnicities, or to any clinical populations.
In conclusion, we showed that higher ML-assessed AAC was associated with poorer bone health and an increased risk of incident clinical fractures and fall-related hospitalizations.Compared to manually assessed AAC, ML-AAC24 provides similar risk estimates for incident falls and clinical fractures.This work supports the assessment of ML-AAC24 on lateral spine images in clinical practice to identify individuals at greatest need of interventions to mitigate declines in musculoskeletal health to prevent falls and fracture, independent of traditional fall and fracture risk factors.

b
Minimally adjusted = age, treatment code and BMI.c Multivariable adjusted = minimally adjusted model plus prevalent atherosclerotic vascular disease, smoked ever, prevalent diabetes mellitus, statin use, blood pressure lowering medication use, socioeconomic status, physical activity, and self-reported prevalent falls.Journal of Bone and Mineral Research n 1872 VIA ET AL.

Fig. 2 .
Fig. 2. Hazard ratios (HRs) for fracture outcomes over 10 years and fallrelated hospitalization risk over 14.5 years using machine learning AAC24 (ML-AAC24) and manually assessed AAC-24.HRs are for moderate to extensive AAC (AAC-24 ≥ 2) compared with low AAC (AAC-24 ≤ 1) as the reference category.The fracture models (n = 1023) are adjusted for treatment code and FRAX with femoral neck bone mineral density (BMD).The falls model (n = 998) is adjusted for age, treatment code, body mass index (BMI), prevalent atherosclerotic vascular disease, smoked ever, prevalent diabetes mellitus, statin use, blood pressurelowering medication use, socioeconomic status, physical activity, and self-reported prevalent falls.

Table 1 .
Baseline Characteristics and Fall and Fracture Related Variables Stratified by Machine Learning Assessed Abdominal Aortic Calcification Journal of Bone and Mineral Research n 1870 VIA ET AL.

Table 1 .
Continued Data expressed as mean AE SD, median (interquartile range), or number and (%).Bolded values represent significant differences ( p value <0.05) between AAC categories using t test, chi-square test, or Mann-Whitney U test where appropriate. Note:

Table 2 .
Hazard Ratios (HRs) for Fracture Outcomes Over 10 Years and Fall-Related Hospitalization Risk Over 14.5 Years By Machine Learning Assessed Categories of Abdominal Aortic Calcification