Radiofrequency echographic multi spectrometry for the prediction of incident fragility fractures_ A 5-year follow-up study

Purpose: To investigate the effectiveness of the T-score values provided by Radiofrequency Echographic Multi Spectrometry (REMS) in the identification of patients at risk for incident osteoporotic fractures. Methods: A population of Caucasian women (30–90 years), enrolled from 2013 to 2016, underwent dual X-ray absorptiometry (DXA) and REMS scans at axial sites. The incidence of fragility fractures was assessed during a follow-up period up to 5 years. Afterwards, patients with and without incident fractures were stratified in two age-matched groups with a 1: 2 proportion (Group F’ and Group NF’, respectively). The performance of REMS Tscore in discriminating between the two groups was quantitatively assessed and compared with DXA. Results: 1516 patients were enrolled and 1370 completed the follow-up (mean ± SD: 3.7 ± 0.8 years; range: 1.9–5.0 years). Fracture incidence was 14.0%. Age-matched groups included 175 fractured patients and 350 non-fractured ones, respectively (median age 70.2 [interquartile range: 61.0–73.3] and 67.3 [65.4–69.8] years, p-value ns). The groups resulted also balanced for height, weight and BMI (p-values ns). As expected, the differences in REMS T-score (for vertebral site, −2.9 [−3.6 to −1.9] in Group F’, −2.2 [−2.9 to −1.2] in Group NF’) and DXA T-score (−2.8 [−3.3 to −1.9] in Group F’, −2.2 [−2.9 to −1.4] in Group NF’) were statistically significant (p-value < 0.001). Analogous results were obtained for femoral neck. Considering the T-score cut-off of −2.5, REMS identified Group F’ patients with a sensitivity of 65.1% and specificity of 57.7% of (OR = 2.6, 95%CI: 1.77–3.76, p < 0.001), whereas DXA showed a sensitivity of 57.1% and a specificity of 56.3% (OR = 1.7, 95%CI: 1.20–2.51, p-value = 0.0032). For femoral neck, REMS sensitivity and specificity were 40.2% and 79.9%, respectively, with an OR of 2.81 (95%CI: 1.80–4.39, p < 0.001). DXA, instead, showed a sensitivity and specificity of 42.3% and 79.3%, respectively, with an OR of 2.68 (95%CI: 1.71–4.21, p < 0.001). Conclusions: REMS T-score resulted an effective predictor for the risk of incident fragility fractures in a population-based sample of female subjects, representing a promising parameter to enhance osteoporosis diagnosis in the clinical routine.


Introduction
Osteoporosis is as a skeletal disorder in which deterioration of bone density and bone quality increase the risk of fracture [1]. It is usually defined as a silent disease until a fracture occurs, with the patient self-perception of fracture risk often underestimated with respect to the actual clinically assessed risk [2]. It is a highly prevalent disease: fractures associated with osteoporosis are common and, although there to double globally over the next 3 decades [3]. In many Western populations, the risk of a fracture occurring in the remaining lifetime for people aged 50 or older is 50% for women and 20% for men, with fractures of the proximal femur and vertebral body being associated with excess mortality over a 5-year period following fracture for both men and women [4,5]. In Italy, it has been estimated that osteoporosis affects about 5 million people and the economic impact is very high: the cost for the treatment of osteoporosis fractures exceeds 7 billion Euro per year, of which only 360,000 Euro for secondary drug prevention [6]. Thus, osteoporosis and the correlated increased risk of fracture have devastating consequences, both for patient's quality of life and for public health, resulting in heavy economic and social costs across the world [1]. Therefore, an identification of patients at high risk of fracture is essential, and high sensitivity and specificity levels should be pursued in this field [7]. According to the World Health Organization (WHO) diagnostic classification, a bone mass density (BMD) at the hip or lumbar spine that is less than or equal to 2.5 standard deviations (SD) below the mean BMD of a young-adult reference population (corresponding to a T-score ≤−2.5) defines the status of osteoporosis. It is important to underline that the diagnosis of osteoporosis is a risk factor for fractures, but the majority of fracture occurs in a population with a non-osteoporotic status [8][9][10][11][12]. The limitations of the current technologies have resulted in underdiagnosis and undertreatment of osteoporosis [13] and have typically postponed the diagnosis of osteoporosis after the occurrence of the first fracture. These considerations have encouraged the research towards diagnostic methods complementary or alternative to the current gold-standard dual X-ray absorptiometry (DXA). Several different approaches have been proposed, including additional DXA-based parameters, such as trabecular bone score (TBS) and hip-axis length; alternative X-ray methods, such as quantitative computed tomography (QCT) and high-resolution peripheral QCT; alternative non-ionizing methods, such as magnetic resonance or quantitative ultrasound (QUS) [14]. As concerning QUS, the analysis of the heel is certainly the most frequent. However, several studies consider other anatomical peripherical sites, such as, for example, the 5-year follow-up study investigating the ability of QUS at distal radius, tibia, and phalanx for the prediction of fractures [15]. Further clinical studies are anyway warranted to clearly assess the strengths of various techniques and sites since, currently, no univocal results have been presented [16].
An innovative non-ionizing approach called Radiofrequency Echographic Multi Spectrometry (REMS) has been introduced. This novel technology is based on the analysis of the raw unfiltered ultrasound signals acquired during an echographic scan of lumbar spine and/or femoral neck and provides a DXA-equivalent BMD value. The precision and diagnostic accuracy of REMS as compared to DXA have been already validated in both single-center and multicenter studies [16][17][18][19], and it has been recently presented as the first clinicallyavailable method for direct non-ionizing measurement of lumbar and femoral BMD for osteoporosis diagnosis and fracture risk prediction in the context of an expert consensus meeting organized by the European Society for Clinical and Economic Aspects of Osteoporosis, Osteoarthritis and Musculoskeletal Diseases (ESCEO) [14].
The aim of this study was to investigate the effectiveness of T-score values provided by REMS in the identification of patients at risk for incident osteoporotic fractures and to compare the performance of REMS with the DXA one.

Patients
This was a prospective observational study. The inclusion criteria were: Caucasian ethnicity, female sex, aged 30-90 years, absence of significant deambulation impairment, medical prescription for an axial DXA investigation and provision of written informed consent. The patients were recruited from October 2013 to October 2016 at the Galateo Hospital in San Cesario di Lecce (Lecce, Italy). All the enrolled patients underwent a DXA investigation of the spine and femur and an echographic scan of lumbar vertebrae and femoral neck performed with the REMS approach. All the examinations were performed under the strictest adherence to the corresponding applicable guidelines, as already described by Di Paola et al. [19] and briefly recalled later.
The study protocol was approved by the Ethics Review Boards of the Galateo Hospital in San Cesario di Lecce (Lecce, Italy). All the enrolled patients voluntarily entered the study after giving written informed consent. All the data were anonymized before being used for the statistical analysis.

Follow-up assessment
The assessment of the incident fragility fractures relied on medical reports based on imaging investigations, such as radiographs, vertebral morphometry, etc., acquired during a follow-up period lasting up to 5 five years. Patients were contacted every six months to assess their health status by telephonic interview and the actual nature of the declared fractures was then verified as described. Traumatic fractures were excluded, whereas patients that suffered more than one fragility fracture were not excluded: in case of multiple fragility fractures, for the purposes of this study, we considered only the first occurred fracture. Subsequently, the patients were stratified in two groups, i.e. those who suffered a fragility fracture during the follow-up period (Group F, fractured) and those who did not (Group NF, non fractured). Iteratively excluding the oldest and youngest patients in Group F and NF, respectively, two age-matched groups with an enrolment proportion of 1:2 were obtained. All other patients' information was kept hidden during this selection process. The age-matched groups were labelled as Group F' (with incident fractures) and Group NF' (without incident fractures).
More in detail, the age-matching procedure was actually performed through the following steps: 1) both the patient groups (fractured and not fractured) were separately ordered from the oldest to the youngest; 2) initially, we considered all the N fractured patients and the oldest 2N non-fractured ones and checked if the age difference between these two groups was significant or not; 3) as long as the two groups were significantly different with respect to patient age, we removed the oldest fractured patient (so that the considered fractured patients became N-1), considered the oldest 2(N-1) non-fractured patients (which means that we actually removed the 2 youngest not fractured patients from the 2N that had been considered in the previous step) and checked again the significance of age difference.
Step 3) was iteratively repeated until we obtained two groups whose age difference was not significant: these groups were finally labelled as Group F' (with incident fractures) and Group NF' (without incident fractures).

DXA measurements
DXA scans were performed using a Discovery W (Hologic, Waltham, MA, USA) scanner according to the standard clinical routine procedures. Spinal investigations were carried out with hip and knee both at 90°of flexion, whereas during femoral scans the patient's femur was straight on the table, with the shaft being parallel to the vertical edge of the obtained image, and with a 15-25°internal rotation [20].
Medical reports always included both the DXA-based BMD value (expressed as g/cm 2 ) and the corresponding T-score value, based on the standard reference database for Caucasian women integrated in the DXA scanner software. All the DXA medical reports were anonymized and digitally stored for the subsequent analyses. Employed DXA scanner underwent daily quality control and regular maintenance for the whole study period.

REMS acquisitions
REMS scans were performed employing a dedicated echographic device (EchoStation, Echolight Spa, Lecce, Italy), equipped with a convex transducer operating at the nominal frequency of 3.5 MHz and used as recommended by the manufacturer. Data processing methodologies implemented in the REMS approach were detailed in previous papers [17][18][19].
Lumbar scans were performed by placing the echographic transducer in a trans-abdominal position under the sternum, in order to initially visualize L1 lumbar vertebra and then moving it until L4 according to the on-screen and audible indications provided by the device software (EchoStudio, Echolight Spa, Lecce, Italy). Each lumbar scan lasted 80 s (20 s per vertebra) and it was followed by an automatic processing time of about 1-2 min.
Femoral neck scans were performed by placing the echographic transducer parallel to head-neck axis of the femur, in order to visualize the typical proximal femur profile. Once the acquisition started, the operator held this image for 40 s, according to the on-screen and audible indications provided by the EchoStudio software, and then wait for about 60 s for the automatic data processing.
For all the performed acquisitions, transducer focus and scan depth were adjusted for each patient in order to have the target bone interface in the ultrasound beam focal zone and at a distance of at least 3 cm from image bottom. The operators had an experience of at least 3 months with REMS.

Statistical analysis
All the statistical analyses were performed using MedCalc® software (version 19, MedCalc Software, Mariakerke, Belgium) and MATLAB® (MathWorks, Natick, MA). The threshold of statistical significance was set at p-value < 0.05. Continuous variables, i.e. patients' group age, height, weight, BMI, REMS-and DXA-based T-score were presented as median and interquartile range or percentile values and Mann-Whitney U test was performed to assess statistically significant differences between the two patient groups. Discrete variables were reported as counts and percentage of the total sample.
The degree of correlation between DXA and REMS T-score values was quantified through Pearson's correlation coefficient (r). The agreement between DXA-and REMS-based T-score values was assessed through the Bland-Altman method, by measuring the residual standard deviation (RSD) and the Cohen's k.
Sensitivity, specificity and odds ratio (OR) for fracture were computed for DXA and REMS T-score values stratified at the threshold of −2.5, thus identifying osteoporotic patients (T-score ≤−2.5) versus non-osteoporotic ones (T-score > −2.5). The effectiveness of DXA and REMS T-score to discriminate between patients who did and who did not sustain a fragility fracture during the follow-up period was evaluated by applying the receiver operating characteristic (ROC) curve approach and statistical significance of the difference in area under curve (AUC) was measured by the Delong test [21]. The association of possible confounding factors (i.e. baseline patients' characteristics) with the predictor (i.e. T-score values) was investigated through Pearson correlation. The effect of the resulting confounding factors was taken into consideration through confounding factors-adjusted ROC analysis. A sub-analysis on the correlation between REMS and DXA T-score values and site of fracture (in particular, involving vertebrae, upper limb or hip/femur) was performed using Kruskal-Wallis test. The ability of DXA and REMS T-score values to discriminate between different fragility fracture sites (in particular considering vertebral fracture, hip fractures including pelvis and femur, and other anatomical sites) was assessed by ROC curve analysis.
As shown in Table 1, women who suffered an incident fragility fracture during follow-up were much older and slightly shorter than those who did not. Two age-matched subgroups with ratio 1:2 (with and without incident fragility fractures, respectively) involving a total of 525 patients were obtained from the initial cohort of 1370 patients by considering the oldest 350 non-fractured patients and the youngest 175 fractured patients. Moreover, after the age-matching process, the two resulting groups resulted to be balanced also for all the antropometric parameters.

Table 1
Patient demographics and baseline characteristics. Overall dataset data refers to the enrolled patients with the exclusion of drop-out. Group F and Group NF categorize the overall population in those who underwent a fragility fracture during follow-up and those who did not, respectively. Group F' and Group NF' refer to age-matched population, with and without incident fragility fractures during follow-up, respectively. Results are reported as median value (25th -75th percentiles

Agreement analysis between DXA and REMS T-score
Vertebral DXA and REMS T-score values were highly correlated, with Pearson's correlation coefficient r = 0.92 (p < 0.001) and RSD = 0.53 for the 1370-patient dataset (Fig. 1a). The distribution of the residuals, i.e. the difference of the REMS T-score values to the regression line, is shown in detail in Fig. 1b. The mean ± standard deviation of the difference distribution is 0.03 ± 0.52 for Group NF and − 0.17 ± 0.51 for Group F. These statistics are substantially preserved if a subgroup regression analysis is performed, with r = 0.92 (RSD = 0.51) for Group F' and r = 0.91 (RSD = 0.52) for Group NF'. The slope of the regression line is 0.97 for the overall group, 0.95 and 0.99 for non-fractured and fractured patient subgroups, respectively. The agreement between the classification techniques expressed as Cohen's k is 0.8, namely at the boundary between good and very good strength of agreement [22]. These results were substantially the same for the femoral site, with r = 0.92 (RSD = 0.42) for the overall group, r = 0.88 (RSD = 0.44) and r = 0.93 (RSD = 0.42) for Group F' and NF', respectively. The slope of the regression line is 1.03 for the overall group, 1 and 1.03 for Group F' and Group NF', respectively. The agreement between the classification techniques expressed as Cohen's k is 0.79.
The differences between DXA and REMS T-score are shown through the Bland-Altman plot in Fig. 2: the average difference (expressed as bias ± 2 SDs) was 0.01 ± 1.06. The statistical analysis testing the null hypothesis that the mean value of the difference is different from 0 resulted in a p-value = 0.52.
In the discrimination between osteoporotic and non-osteoporotic patients considering DXA results as reference, REMS showed a sensitivity of 92.4% and a specificity of 94.4% for the overall dataset of patients and 93.7% and 90.8%, respectively, for the subgroup analysis. Analogue results of sensitivity and specificity for femoral scans are 90.9% and 96.2%, for the overall group, and 91.4% and 95.2% for the subgroup analysis, respectively. In more detail for the vertebral site, the percentage of patients identified as "healthy" and who actually did not experience incident fragility fractures was similar between DXA and REMS (75.6% and  74.5%, respectively). On the contrary, the 39.5% of the patients that were "osteoporotic" for DXA fractured during the follow-up period, against the 43.7% for REMS. Finally, the patients diagnosed as "osteopenic" by DXA suffered a fracture in 29.1% of cases, against 21.9% for REMS.

Fragility fracture occurrence
Considering the reference T-score threshold value of −2.5 to distinguish between osteoporotic and non-osteoporotic patients, the sensitivity and specificity of REMS at the vertebral site in the identification of incident fragility fracture were 65.1% and 57.7%, respectively, with an OR of 2.6 (95%CI: 1.77-3.76, p < 0.001). Vertebral DXA, instead, showed a sensitivity and specificity of 57.1% and 56.3%, respectively, with an OR of 1.7 (95%CI: 1.20-2.51, p = 0.0032).
Lower values of vertebral T-score correlate with fractures at vertebrae, wrist/forearm and hip both for DXA and REMS (at a Kruskal-Wallis test, p-value < 0.001).
The ROC curve comparison for vertebral DXA and REMS T-score in the age-matched groups (represented in Fig. 4a) illustrates the ability to discriminate patients with incident fragility fractures for both REMS Tscore (AUC = 0.66) and DXA T-score (AUC = 0.61), and the difference between curves is statistically significant (p = 0.0002). Correspondingly, for the femoral site, AUCs for REMS and DXA were 0.64 and 0.65, respectively, and the difference between the curves was not statistically significant (p = 0.38). The patients' characteristics correlated with the outcome (i.e. age, height and BMI) were considered in order to obtain the covariate-adjusted ROC curves from the overall dataset of patients. However, since height and BMI are intrinsically correlated, we have presented only results related to age and BMI. Age presented a correlation with the investigated predictors, showing a Pearson correlation coefficient r = −0.27 and r = −0.30 for lumbar spine and femoral neck DXA T-score values, respectively, and r = −0.27 and r = −0.26 for lumbar spine and femoral neck REMS T-score values, respectively (p < 0.0001 in all cases). Similarly, considering the correlation between BMI and the investigated predictors, r = 0.27 and r = 0.30 for lumbar spine and femoral neck DXA T-score values, respectively, and r = 0.25 and r = 0.28 for lumbar spine and femoral neck REMS T-score values, respectively (p < 0.0001 in all cases).
Similarly, for femoral neck T-score values, the age-adjusted AUCs of DXA and REMS T-score values were 0.583 and 0.627 (p = 0.06), respectively. The BMI-adjusted AUCs of DXA and REMS T-score values were 0.674 and 0.695 (p = 0.24), respectively. When adjusting for age and BMI, the AUCs of DXA and REMS T-score values were 0.596 and 0.632 (p = 0.08), respectively.
The results of the analysis of ROC curves for different fracture sites, in particular divided in vertebral fractures, hip fractures including femur and pelvis bone, and other site of fracture, are reported in Table 2. As expected, the performance of lumbar spine T-score Fig. 3. Number of patients with fractures (bar graph above) and fracture rate (bar graph below) per 0.5 step of T-score. DXA and REMS resuts are represented in light grey and dark grey, respectively. The dashed and continuous vertical lines in correspondence of −2.5 and −1 T-score, respectively, graphically highlight the "osteoporotic area" (left part of the graphs) from the "osteopenic area" (between the two lines) and the "healthy area" (right part of the graphs).
improves when considering the subgroup of vertebral fractures with AUC = 0.78 for both technologies. However, the results of this subanalysis will be further investigated in future studies since the ratios of cases/controls in fracture subgroups of different sites are too small for a conclusive statement.

Discussion
As an epidemiologic snapshot, approximately 465,000 fragility fractures were sustained in Italy in 2010. The projections for 2025 foresee an increase of +23% of population above 50 years of age, +28% of total number of fractures (between +21% and +31% depending on the site of fracture) and a corresponding increase in burden of osteoporosis (without considering the loss in quality adjusted life years) of +23% reaching about €8.6 billion in Italy [23].
The demographic changes in the coming decades, with increasing life-expectancy and aging of the population, are expected to drive a marked increase in the incidence of osteoporotic fractures, thus underlining the importance of prevention and early identification of patients at high risk of fracture [24].
To date, the measurement of T-score is derived from DXA imaging, which gives an estimate of BMD. In 1994, the World Health Organization (WHO) recommended the T-score cutoff value of −2.5 or lower (and/or having a previous fragility fracture) for the identification  of osteoporosis [25]. Several studies have demonstrated the predictive value of T-score, with an approximately double fracture risk per unit decrease [26]. On the other hand, several data indicate also that it captures only approximately 50% of women with fragility fractures, with most individuals who sustain fragility fractures having T-score above the cutoff of −2.5 [27,28].
Using osteoporosis and elderly age as criteria for therapeutic intervention against osteoporosis would probably reduce the population burden of fractures but solutions to the prevention of the remaining fragility fractures remain unavailable [28].
This incongruity poses a challenge to clinicians in the identification of patients who may benefit from osteoporosis treatments.
It is evident that there is room for more accurate predictive tools in this field, and some alternative diagnostic tools have been proposed over the last years [29][30][31], including DXA-based parameters, such as TBS, hip axis length and hip structural analysis [14], or specific questionnaire-based tools, focused on the integration of densitometric data with clinical risk factors aiming to improve fracture risk prediction, especially in those patients with apparently normal BMD values. The most know among such tools are Fracture Risk Assessment Tool (FRAX) and Derived FRAX tool (DeFRA), both of which, for instance, have recently resulted to be effective fracture predictors in postmenopausal diabetic women [32]. Other tools under investigation are based on QUS approaches, aiming to find a correlation between ultrasound-based parameters and BMD estimated by DXA [33][34][35] or between ultrasound-based parameters and bone fragility in specific sets of patients (i.e. elderly population [36] or post-menopausal women [37]), showing significant areas of development for osteoporosis diagnosis and monitoring also beyond the ionizing BMD-based approaches.
Nonetheless, despite the mentioned limitations, at present DXA is deemed the gold standard method. For this reason, in the present study we have considered the classification based on DXA T-score as reference in order to assess the predictive ability of REMS, i.e. a technology based on ultrasound acquisition which has been recently proposed as an additional method for osteoporosis diagnosis [17,18] and has been already validated in clinical multicenter studies [19,38]. The main advantages with the use of this technology, including the use of nonionizing radiation, the analysis of axial sites (i.e. femoral neck and lumbar vertebrae), the high accuracy and reproducibility of the results [14,19], increase the urgency for prospective studies.
This clinical trial showed the effectiveness of REMS in the identification of incident fragility fractures on the base of the lumbar spine Tscore classification (OR = 2.6, AUC = 0.66) in comparison with the discriminatory power of lumbar spine DXA T-score (OR = 1.7, AUC = 0.61). The ROC curve analysis showed consistent results when age-matched approach and covariate-adjusted approach were performed. On the other site, the analysis of fracture subgroups should be further investigated in future works because of the reduced number of cases per fracture type included in the present study.
As concerning DXA analysis, the results of the present study reflect what has been already presented in several studies, in terms of OR for fracture risk assessment (for instance, OR for vertebral fracture of 1.36 (95% CI: 1.26-1.47) per 0.10 g/cm 2 decrease in lumbar spine BMD [39] and average OR for hip fracture of 1.8 per T-score changes measured at lumbar spine [25]) and of discriminative ability in the identification of patients with osteoporotic fractures (for instance, AUC of 0.60-0.62 for the identification of subjects at risk for any or vertebral fractures using lumbar spine BMD [40,41] and of 0.59 for the identification of subjects at risk for vertebral fractures using lumbar spine DXA T-score [42]).
In this study, as expected, REMS vertebral T-score was significantly lower in patients who sustained a fragility fracture than in patients who did not (median value −2.9 versus −2.2 for fractured and non-fractured age-matched patients, respectively). Overall, the diagnostic agreement with DXA (84.8% for the vertebral site and 84.2% for the femoral site) was substantially in line with the results by Di Paola et al. [19], as well as the observed specificity and sensitivity in the classification between osteoporotic or non-osteoporotic patients.
For the vertebral site, considering the relationship between osteoporotic classification and occurrence of incident fragility fractures, as compared to DXA, REMS showed a higher ability in the identification of the true positives (osteoporotic patients who underwent an incident fragility fracture during follow-up) and a similar ability in the identification of true negatives (healthy patients who did not fractured during follow-up). For the femoral site, instead, the predictive ability is similar between REMS and DXA T-score, with non-significantly different AUCs. It is interesting to observe that, differently from other studies using DXA T-score to estimate fracture risk, in which the femoral site reached higher predictive performance than the vertebral site [43,44], using REMS we did not observe a conclusive superiority of one site with respect to the other, showing that good predictive performance might be obtained from both the axial sites.
Therefore, since both the techniques returned similar fracture probabilities for "healthy" patients, these results show some differences in the distribution of the remaining fragile patients between osteopenia and osteoporosis: DXA classified as "osteoporotic" 64.5% (100/155) of the fractured patients that were not labelled as "healthy", whereas the same happened for 74.5% (114/153) of the patients according to REMS.
The observed differences between the two technologies might be ascribed, at least partially, to the effects of the sources of potential artifacts. In fact, it has been shown that several artifacts on the DXA imaging can affect the scan results (e.g., in case of degenerative disc disease, osteoarthritis, sclerosis, osteophytes [45]), with osteophytes causing misdiagnosis in an estimated 10% of women with osteoporosis [46]. REMS, in principle, should be able to recognize and automatically remove raw signals that belong to calcifications, osteophytes and other possible sources of artifacts, but, to the best of our knowledge, dedicated studies are still lacking. However, the actual investigation of this specific aspect goes beyond the scope of the present work and will be addressed in further dedicated trials focused on comparing the diagnostic outcomes of patients with identified pathologic conditions that are likely to generate artifacts and possible misdiagnoses.
The present study has some limitations. First of all, the employment of non-specific enrolment criteria, which included in particular young individuals with low fracture risk and caused the initial age impairment between patient cohorts with and without incident fractures, often reported as a bias in this kind of analyses [47,48]. However, the most important source of possible biases was eliminated through the agematching procedure, at the expenses of the final number of considered patients, which was anyway enough to keep statistical significance. A second possible limitation is that the main risk factors for fragile fractures were not taken into account, and this may have affected the percentages of fractured patients. Nevertheless, this had the only consequence of limiting the direct comparability with literature-available data on fracture incidence, since the specific results and conclusions of the present study would not be modified by the knowledge of the clinical risk factors. Finally, we aimed at predicting the occurrence of incident fragility fractures on the basis of BMD only, although it is wellknown that BMD is just one of the determinants of bone strength, which actually is strongly influenced by bone quality parameters. Although this has limited the performance of both the techniques in the accurate identification of fractured patients, the comparative evaluation of the effectiveness of BMD alone was reasonably reliable, whereas the integration of additional parameters directly related to the assessment of bone strength will be the subject of subsequent studies. The first parameter that will be tested for this kind of purposes will be the Fragility Score, a REMS-based estimator of bone structure quality that is independent from BMD and has been preliminarily introduced in literature [49,50].

Conclusion
REMS T-score resulted an effective predictor of the occurrence of incident fragility fractures in a population-based sample of women, representing a promising approach to enhance osteoporosis diagnosis in the clinical routine. REMS T-score could be also a suitable parameter to be integrated in fracture risk prediction tools like FRAX and DeFRA in order to increase their effectiveness. Further studies will investigate possible improvements in REMS-based identification of fragile patients through the combination of T-score with additional independent parameters related to bone structure quality, such as the Fragility Score.

Declaration of competing interest
None.