The Performance of a Calcaneal Quantitative Ultrasound Device, CM-200, in Stratifying Osteoporosis Risk among Malaysian Population Aged 40 Years and Above.

BACKGROUND
Calcaneal quantitative ultrasound (QUS) is widely used in osteoporosis screening, but the cut-off values for risk stratification remain unclear. This study validates the performance of a calcaneal QUS device (CM-200) using dual-energy X-ray absorptiometry (DXA) as the reference and establishes a new set of cut-off values for CM-200 in identifying subjects with osteoporosis.


METHODS
The bone health status of Malaysians aged ≥40 years was assessed using CM-200 and DXA. Sensitivity, specificity, area under the curve (AUC) and the optimal cut-off values for risk stratification of CM-200 were determined using receiver operating characteristic (ROC) curves and Youden's index (J). Results: From the data of 786 subjects, CM-200 (QUS T-score <-1) showed a sensitivity of 82.1% (95% CI: 77.9-85.7%), specificity of 51.5% (95% CI: 46.5-56.6%) and AUC of 0.668 (95% CI: 0.630-0.706) in identifying subjects with suboptimal bone health (DXA T-score <-1) (p < 0.001). At QUS T-score ≤-2.5, CM-200 was ineffective in identifying subjects with osteoporosis (DXA T-score ≤-2.5) (sensitivity 14.4% (95% CI: 8.1-23.0%); specificity 96.1% (95% CI: 94.4-97.4%); AUC 0.553 (95% CI: 0.488-0.617); p > 0.05). Modified cut-off values for the QUS T-score improved the performance of CM-200 in identifying subjects with osteopenia (sensitivity 67.7% (95% CI: 62.8-72.3%); specificity 72.8% (95% CI: 68.1-77.2%); J = 0.405; AUC 0.702 (95% CI: 0.666-0.739); p < 0.001) and osteoporosis (sensitivity 79.4% (95% CI: 70.0-86.9%); specificity 61.8% (95% CI: 58.1-65.5%); J = 0.412; AUC 0.706 (95% CI: 0.654-0.758); p < 0.001). Conclusion: The modified cut-off values significantly improved the performance of CM-200 in identifying individuals with osteoporosis. Since these values are device-specific, optimization is necessary for accurate detection of individuals at risk for osteoporosis using QUS.


Introduction
The rapid ageing of the global population brings forth many noncommunicable diseases, including osteoporosis and its associated fragility fractures. Apart from bearing the direct and direct medical costs, fractured patients suffer from chronic pain, loss of independence and increased mortality [1]. A pooled analysis of data from the United States, Australia, Asia and the Middle East Crescent shows that hip fractures are particularly devastating because 20-40% of people suffering from hip fractures would die within a year and 10% of the survivors would suffer contralateral hip fractures [2]. By 2050, 50% of all hip fractures worldwide are expected to occur in Asia [3]. Malaysia has been predicted to experience a 3.55-fold increase in hip fracture incidence by 2050 compared to 2018, which is one of the highest increases among Asian countries [4]. Early detection of bone loss and timely prophylaxis are critical in retarding the progression of osteoporosis among high-risk individuals.
Osteoporosis is diagnosed based on bone mineral density (BMD) at the lumbar spine or proximal femur measured by dual-energy X-ray absorptiometry (DXA), which serves as the gold standard technique in clinical settings [5,6]. The individual BMD is then compared against the reference young adult mean values to generate T-score. According to guidelines by World Health Organization (WHO), a BMD value ≤−2.5 standard deviations (SD) of the young adult mean (or a T-score ≤−2.5) indicates osteoporosis, while a T-score between −1 and −2.5 indicates osteopenia [7]. DXA can predict fracture risk and monitor response to treatment [8][9][10]. Moreover, it can determine the whole body and regional body composition [11,12]. Despite its crucial role in the clinical settings, the accessibility of DXA is limited by the cost of the machine and procedure and availability in developing countries. As of 2012, there are about 100 DXA machines available throughout the whole of Malaysia [13]. Hence, it is not suitable to be used for public screening of osteoporosis [13].
Quantitative ultrasound (QUS) offers a portable, radiation-free and relatively less costly method of bone health screening compared to DXA [14]. In general, QUS uses transmission time of ultrasound (speed of sound; SOS) or attenuation of sound signals/frequency (broadband attenuation ultrasound; BUA) across the body part measured to determine bone health [15]. However, the algorithm for generating QUS indices differ between manufacturers and the results may not be comparable. Previous studies demonstrated that QUS indices associate significantly with bone mass, microarchitecture and fracture risk in population studies [16,17]. According to the International Society for Clinical Densitometry (ISCD), the calcaneus is the only validated anatomical site recommended for osteoporosis screening [18]. Previous studies have proved that calcaneal QUS is useful for early detection of osteoporosis [19][20][21]. However, the use of WHO's T-score cut-offs for osteoporosis and osteopenia in QUS remains controversial because the cut-off values of <−1 and ≤−2.5 are established based on DXA and the skeletal properties examined are different between QUS and DXA [18].
The use of CM-200, a calcaneal QUS device, in population studies to investigate bone health has been reported [15,[22][23][24][25][26]. Since this device has built-in reference data for the Japanese population, it has been used to assess the bone health status of the Asian population. Most studies using CM-200 adopted the WHO's T-score cut-off based on DXA, which is not recommended by ISCD [18]. Thus, the current study aimed to validate the performance of CM-200 against DXA and establish a new set of cut-off values in identifying subjects with osteoporosis in a population aged ≥40 years and residing in Klang Valley, Malaysia.

Materials and Methods
This cross-sectional study was conducted from April 2018 to April 2019 in Klang Valley, Malaysia. The protocol of the main study has been described previously [27][28][29]. Community-living Malaysian subjects aged ≥40 years were recruited using quota sampling technique, whereby subjects were stratified based on the demographic population in Klang Valley, which is 45% Malays, 45% Chinese and 10% Indians and other ethnic groups [30]. Invitations with specific inclusion and exclusion criteria were sent to community centers in Klang Valley and advertised in local newspapers and radio. Subjects with conditions that alter bone metabolism (hypo/hyperthyroidism, hypo/hypergonadism, hypo/hypercalcemia), previously diagnosed with bone diseases (osteogenesis imperfecta, osteomalacia, Paget's disease), receiving therapeutic agents (thiazide diuretics, glucocorticoids, osteoporosis treatment agents), having mobility issues, with implants (at lumbar spine, hip and lower limbs) and those who failed to complete the screening procedure were excluded from the study. Informed consent was obtained from all the subjects before their enrolment. Research Ethics Committee of Universiti Kebangsaan Malaysia (Code: UKM PPI/111/8/JEP-2017-761) reviewed and approved the protocol of the main study on 27 December 2017.
Sample size calculation was performed using MedCalc (MedCalc Software Ltd., Ostend, Belgium) to ensure that the number of subjects recruited was sufficient for receiver operating characteristic (ROC) analysis [31]. The following values were used for the calculation: type-1 error = 0.05; type-2 error = 0.20; target area under the curve (AUC) = 0.7; AUC null hypothesis = 0.5; ratio of negative/positive cases = 689/97 (nonosteoporosis/osteoporosis) [27]. The minimal sample size derived was 19 positive cases and 135 negative cases. Thus, the number of subjects recruited for the main study was sufficient for ROC analysis.
All subjects completed a demographic questionnaire prior to their bone health assessment. Their age was determined from records on their identification cards. Their sex, ethnicity and presence of medical conditions were self-declared. Standing height of the subjects without shoes was measured to the nearest 1 cm using a stadiometer (Seca, Hamburg, Germany). Body weight of the subjects with light clothing but without shoes was measured to the nearest 0.1 kg using a weighing scale (Tanita, Tokyo, Japan). Body mass index (BMI: kg/m 2 ) was calculated as per the convention. For subjects aged <65 years, BMI was classified as underweight (<18.5 kg/m 2 ), normal (18.5-24.9 kg/m 2 ) or overweight (>24.9 kg/m 2 ) [32]. For subjects aged >65 years, a BMI <22 kg/m 2 was classified as underweight, 22-27 kg/m 2 as normal and >27 kg/m 2 as overweight [33]. For the analysis, both underweight and normal BMI were grouped, while overweight was set as another group.
BMD at the lumbar spine (anteroposterior, L1-L4) and left hip of the subjects was assessed using DXA (Hologic Discovery QDR Wi densitometer, Hologic, MA, USA) operated by a single trained technician, blinded to the results of QUS, throughout the study period. WHO criteria were referred to categorize the bone health status of subjects as osteoporosis (T-score ≤ −2.5), osteopenia (T-score < −1 and > −2.5) or normal (T-score > −1.0) based on either lumbar spine or left hip T-score values. The ethnic-based reference BMD of Singaporean young adults was used in the calculation of T-score as per the recommendation of Malaysian Osteoporosis Society [34]. This reference was included in the DXA software provided by the manufacturer. The DXA measurements were performed as per the standard protocol. The subjects were required to wear light clothing and lie supine on the DXA machine. The technician positioned the subjects accordingly for the scans. Daily calibration was conducted using a phantom. Short-term in-vivo coefficients of variation for this device were 1.8% and 1.2% for the lumbar spine and total hip, respectively.
Quantitative ultrasound measurement was performed using the gel-based CM-200 bone ultrasonometer (Furuno, Nishinomiya City, Japan) operated by another trained technician, blinded to the results of DXA, throughout the study period. This device uses SOS in m/s as the parameter to assess the bone health status of the subjects. For the present study, T-score generated based on the SOS was used to classify the bone health status of the subjects. T-score values obtained were based on the Japanese population reference provided by the manufacturer. A T-score ≤ −2.5 indicates high risk for osteoporosis, T-score < −1 and > −2.5 indicates moderate risk for osteoporosis, and T-score > −1.0 indicates low risk for osteoporosis [15,22].
Kolmogorov-Smirnov test was used to determine the normality of the data. Independent t-test determined the significant difference in basic characteristics of subjects. Receiver operating characteristic (ROC) curves were generated to assess the performance of QUS in identifying subjects with suboptimal bone health (osteopenia + osteoporosis) and osteoporosis using DXA as the reference. The suboptimal bone health group is important because individuals with osteoporosis and osteopenia are both susceptible to fragility fractures [35,36]. In the first ROC, subjects with osteopenia or osteoporosis were grouped as suboptimal bone health. In the second ROC, subjects were divided into the group with or without osteoporosis (which included normal and osteopenia). Sensitivity, specificity and AUC were determined. Correlation between DXA and QUS T-score (continuous data) was identified using Pearson's correlation, while Cohen's kappa statistics determined the agreement between the bone health status categorized by both techniques (ordinal data). A kappa value (κ) ≤ 0.20 indicates no agreement, while 0.21 < κ ≤ 0.39 indicates minimal agreement, 0.40 < κ ≤ 0.59 indicates weak agreement, 0.60 < κ ≤ 0.79 indicates moderate agreement, 0.80 < κ ≤ 0.90 indicates strong agreement and k > 0.90 indicates perfect agreement [37]. Optimal cut-offs of QUS T-score were obtained by tracing the coordinates of ROC curves and Youden's Index (J = sensitivity + specificity -1). The cut-offs with the highest Youden's Index were selected as the optimal cut-offs [38]. Significance value was set at p < 0.05 (two-tailed). All statistical analyses were performed using Statistical Package for Social Science version 22.0 (IBM, Armonk, NY, USA).

Results
A total of 910 subjects were recruited in the study, but 20 of them were excluded for receiving thiazide diuretics, 32 for glucocorticoids, 4 for cancer treatment, 30 for hormone treatment, 12 for having mobility problems, 5 for hysterectomy before menopause and 21 for not completing study procedure. Finally, data from the remaining 786 subjects were included in the final analysis, of which 48.6% were men and 51.4% were women. The overall prevalence of osteoporosis determined by DXA was 12.3%, but it was higher in women compared to men. Based on QUS, more than half of the subjects were categorised as having risk for osteoporosis, followed by normal and high risk of osteoporosis (Table 1). Kappa and correlation statistics revealed that the agreement and association for osteoporosis risk stratification between CM-200 and DXA were minimal for men, women and the overall study population. DXA T-scores correlated positively and significantly with QUS T-scores of the subjects. The strength of association between QUS and DXA T-scores was similar between women and men ( Table 2). The performance of CM-200 using DXA as reference was assessed through ROC analysis (Figure 1). At T-score < −1.0, CM-200 showed a sensitivity of 82.1% (95% CI: 77.9-85.7%), specificity of 51.5% (95% CI: 46.5-56.6%) and AUC of 0.668 (95% CI: 0.630-0.706; p < 0.001) in identifying subjects with suboptimal bone health (DXA T-score < −1). At T-score ≤ −2.5, CM-200 showed a sensitivity of 14.4% (95% CI: 8.1-23.0%), specificity of 96.1% (95% CI: 94.4-97.4%) and AUC of 0.553 (95% CI: 0.488-0.617; p = 0.093) in identifying subjects with osteoporosis (DXA T-score ≤ −2.5). Sub-analysis according to sex, age, ethnicity and BMI showed similar results, whereby the sensitivity and AUC of CM-200 in detecting subjects with suboptimal bone health were better than in detecting subjects with osteoporosis. The performance of CM-200 in determining men and women with suboptimal bone health or osteoporosis was found to be similar. For men, there was no obvious age trend for the performance of CM-200. However, the sensitivity of CM-200 was higher in older compared to younger women. Among men, the performance of CM-200 in detecting subjects with suboptimal bone health was better in Malay and Chinese men compared to Indians or other ethnic groups, whereas it did not differ among women. The performance of CM-200 also did not differ between the BMI groups in identifying subjects with suboptimal bone health (Tables 3 and 4).  Modification of the QUS T-score of CM-200 was attempted to determine its optimal cut-off values in identifying subjects with suboptimal bone health or osteoporosis. At cut-off < −1.32, an improvement in the performance of CM-200 in identifying men with suboptimal bone health was observed compared to cut-off < −1. Similarly, at cut-off < −1.61, the performance of CM-200 in identifying men with osteoporosis also improved compared to cut-off ≤ −2.5. For women, the optimal QUS T-score cut-off value for suboptimal bone was < −1.37. At cut-off ≤ −1.43, the performance of CM-200 in identifying women with osteoporosis was improved compared to cut-off ≤ −2.5 (Table 5).

Discussion
The current study indicated that the agreement between QUS and DXA at the current cut-offs was significant but weak. Since limited studies reported the kappa coefficient between QUS and DXA, comparison with other devices is not possible. This study also found that QUS and DXA correlated significantly in predicting suboptimal bone health and osteoporosis among the subjects. At the current cut-offs, CM-200 could identify subjects with suboptimal bone health but not subjects with osteoporosis. The modified cut-offs improved the performance of CM-200 in identifying subjects with suboptimal bone health and osteoporosis.
The performance of QUS (T-score < −1) in identifying subjects with suboptimal bone health (DXA T-score < −1) was fair based on AUC, sensitivity and specificity values ( Table 3). Although being underweight, women and Chinese were reported to be predictors of osteoporosis and suboptimal bone health in the same cohort of subjects [27], and sub-analysis based on sex, age, ethnicity and BMI did not alter the results significantly (Table 3), except among Indian men and those from other ethnicities, probably due to small sample size (n = 41). On the other hand, QUS was not effective in identifying subjects with osteoporosis in the present study (Table 4). This observation is contradictory to earlier studies which reported a satisfactory performance of QUS (of different manufacturers) in identifying people with osteoporosis defined by T-score [21,39]. Among 109 Turkish men (mean age: 57.8 ± 13.7 years) and 131 women (mean age: 53.7 ± 11.9 years), both BUA and SOS indices of a calcaneal QUS device (Sahara, Hologic) demonstrated an acceptable performance (men: BUA = 0.661, SOS = 0.735; women: BUA = 0.712, SOS = 0.764) in identifying subjects with osteoporosis (DXA T-score ≤ −2.5) [40]. McLeod et al. (2015) reported good performance of a calcaneal QUS device (Lunar Achilles) in identifying patients with osteoporosis among 174 Canadian women (59.7 ± 6.7 years). The QUS device showed AUCs of 0.892 (using DXA T-score ≤ −2.5 at the femoral neck as reference) and 0.696 (using DXA T-score ≤ −2.5 at the lumbar spine as reference) at QUS T-score ≤ −2.5 [41]. The reasons for discrepancies between our studies and the aforementioned studies are not clear.
Since the use of WHO's cut-offs based on DXA T-score for QUS assessment is not recommended [18], the present study has determined the optimal QUS T-score for CM-200 based on sex to identify subjects with suboptimal bone health and osteoporosis. The modified cut-offs of QUS T-score for CM-200 improved the sensitivity of QUS in identifying subjects at risk of osteoporosis and men with suboptimal bone health. Particularly, the performance of CM-200 in identifying subjects at risk of osteoporosis changed from ineffective to effective ( Table 5). Some of the previous studies have applied modified QUS T-score to determine the optimal performance of QUS in identifying subjects at risk of osteoporosis [21,[42][43][44]. Oral et al. (2019) found that at the modified QUS T-scores of −1.68 and −1.53 for men (mean age: 57.8 ± 13.7 years) and women (mean age: 53.7 ± 11.9 years), respectively, the sensitivity and specificity values obtained were around 70% [40]. Nevertheless, it should be emphasized that the comparison of the performance and cut-off values across QUS devices from different manufacturers is not recommended due to different algorithms or reference database used.
There are several limitations of this study to be addressed. A nonrandomized technique was used to recruit the subjects; thus generalization of the results of this study should be made with caution. However, the ethnic composition of the subjects resembles the ethnic demography in Klang Valley [30]. The subjects were healthier than the general population because subjects with strong secondary risk for osteoporosis had been excluded. Thus, the results of this study might be applied to individuals without strong secondary risk factors of osteoporosis only. The CM-200 device only generates SOS but does not generate other QUS indices, and its reference population is different from the DXA used in this study. Therefore, we cannot guarantee that the results of this study can be replicated in other populations with different characteristics or with QUS devices from other manufacturers. Subgroup analysis among subjects categorized as Indian and underweight may be underpowered due to the low sample size. The sample size was calculated on the assumption that the modified cut-offs for CM-200 could reach an AUC of 0.7. We recalculated the sample size needed for the group with the lowest AUC (0.685 for identification of women with osteoporosis; Table 5) and found that the sample size required was still within acceptable range (n = 128). Lastly, a validation cohort may be required to prove the effectiveness of the modified cut-off values generated in identifying individuals at risk of osteoporosis. Some of the aspects that should be investigated in future research include clinical significance and cost-benefit of using CM-200 in the screening of osteoporosis. Nevertheless, the information from this study can serve as a guide for future studies using CM-200 in public screening of bone health.

Conclusion
The performance of the CM-200 calcaneal QUS device using cut-offs similar to DXA is fair in identifying subjects with suboptimal bone health, but it cannot identify subjects with osteoporosis. The performance of CM-200 improves significantly when a new set of cut-off values is adopted. Therefore, it is recommended that QUS devices should be optimized and their cut-off value should be validated before deploying them in the local setting to ensure optimal performance in identifying individuals at risk of osteoporosis.