Skip to main content
Advertisement
  • Loading metrics

Artificial intelligence in fracture detection with different image modalities and data types: A systematic review and meta-analysis

  • Jongyun Jung,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Biomedical Informatics (Dr. Qing Wu, Jongyun Jung, and Jingyuan Dai), College of Medicine, The Ohio State University, Columbus, Ohio, United States of America

  • Jingyuan Dai,

    Roles Conceptualization, Data curation, Writing – review & editing

    Affiliation Department of Biomedical Informatics (Dr. Qing Wu, Jongyun Jung, and Jingyuan Dai), College of Medicine, The Ohio State University, Columbus, Ohio, United States of America

  • Bowen Liu,

    Roles Writing – review & editing

    Affiliation Department of Mathematics and Statistics, Division of Computing, Analytics, and Mathematics, School of Science and Engineering (Bowen Liu), University of Missouri-Kansas City, Kansas City, Missouri, United States of America

  • Qing Wu

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Resources, Writing – original draft, Writing – review & editing

    Qing.Wu@osumc.edu

    Affiliation Department of Biomedical Informatics (Dr. Qing Wu, Jongyun Jung, and Jingyuan Dai), College of Medicine, The Ohio State University, Columbus, Ohio, United States of America

Abstract

Artificial Intelligence (AI), encompassing Machine Learning and Deep Learning, has increasingly been applied to fracture detection using diverse imaging modalities and data types. This systematic review and meta-analysis aimed to assess the efficacy of AI in detecting fractures through various imaging modalities and data types (image, tabular, or both) and to synthesize the existing evidence related to AI-based fracture detection. Peer-reviewed studies developing and validating AI for fracture detection were identified through searches in multiple electronic databases without time limitations. A hierarchical meta-analysis model was used to calculate pooled sensitivity and specificity. A diagnostic accuracy quality assessment was performed to evaluate bias and applicability. Of the 66 eligible studies, 54 identified fractures using imaging-related data, nine using tabular data, and three using both. Vertebral fractures were the most common outcome (n = 20), followed by hip fractures (n = 18). Hip fractures exhibited the highest pooled sensitivity (92%; 95% CI: 87–96, p< 0.01) and specificity (90%; 95% CI: 85–93, p< 0.01). Pooled sensitivity and specificity using image data (92%; 95% CI: 90–94, p< 0.01; and 91%; 95% CI: 88–93, p < 0.01) were higher than those using tabular data (81%; 95% CI: 77–85, p< 0.01; and 83%; 95% CI: 76–88, p < 0.01), respectively. Radiographs demonstrated the highest pooled sensitivity (94%; 95% CI: 90–96, p < 0.01) and specificity (92%; 95% CI: 89–94, p< 0.01). Patient selection and reference standards were major concerns in assessing diagnostic accuracy for bias and applicability. AI displays high diagnostic accuracy for various fracture outcomes, indicating potential utility in healthcare systems for fracture diagnosis. However, enhanced transparency in reporting and adherence to standardized guidelines are necessary to improve the clinical applicability of AI.

Review Registration: PROSPERO (CRD42021240359).

Author summary

Artificial Intelligence (AI) is increasingly employed to detect fractures by using various imaging modalities and data types. Our search of Medline (via PubMed), Web of Science, and IEEE revealed numerous primary studies demonstrating AI’s superior performance in fracture detection. This systematic review and meta-analysis is the first to assess and compare the diagnostic accuracy of AI models across different imaging modalities and data types for various fracture outcomes. We found that AI models achieve high accuracy in fracture detection, particularly with radiograph images. However, we identified significant flaws in study design and reporting, limiting real-world applicability. Few studies provided patient characteristics, and only half reported the hyperparameter selection process. Our findings underscore the benefits of using AI models with radiographs for fracture detection, as they outperform other imaging modalities. Despite similar results across modalities, inadequate methodology and reporting in AI model evaluations call for improvement. Considering AI’s high diagnostic performance, integrating it into existing fracture risk assessment tools could enhance patient identification and enable early intervention.

Introduction

Bone fractures represent a significant public health concern globally [1], particularly for individuals with osteoporosis [2]. Fractures contribute to work absences, disability, reduced quality of life, health complications, and increased healthcare costs, affecting individuals, families, and societies [3,4]. A meta-analysis of 113 studies reported the pooled cost of hospital treatment for a hip fracture after 12 months as $10,075, with total health and social care costs amounting to $43,669 per hip fracture [5].

Artificial Intelligence (AI), encompassing Machine Learning (ML) and Deep Learning (DL), has been extensively employed for fracture outcome prediction due to technological advancements and accessibility. Various imaging modalities, including X-rays [6,7], computed tomography (CT) [8,9], and magnetic resonance imaging (MRI) [10,11], have been used in fracture diagnosis and detection. AI can also predict fractures using tabular data, such as electronic medical records (structured patient-level data). However, few studies [1214] have applied AI with tabular data in fracture prediction despite its growing importance over the past decade. Recent systematic reviews and meta-analyses have reported high accuracy for AI in fracture detection and classification. Kuo et al. [15] summarized 42 studies with 115 contingency tables, finding pooled sensitivity of 92% (95% CI: 88, 94) and specificity of 91% (95% CI: 88, 93). Yang et al. [16] reviewed 14 studies on orthopedic fractures, reporting pooled sensitivity and specificity of DL models as 87% (95% CI: 78, 93) and 91% (95% CI: 85, 95), respectively.

However, existing systematic review and meta-analysis studies focused solely on image-based analyses, neglecting comprehensive examination of various imaging modalities and data types (image, tabular, or both). Despite the superior performance of AI for medical image analysis and using tabular data, a critical gap exists in the current literature concerning the optimal choice of image modalities and the choice between image, tabular, or combined data types. There is a lack of comprehensive guidance on the most effective selection of image modalities and data types for fracture diagnosis. This gap in knowledge underscores the need for systematic investigation to determine which image modality, and by extension, which data type, yields the highest diagnostic accuracy and clinical relevance in AL algorithms. Addressing this gap will not only optimize the design of AI-based diagnostic tools but also enable healthcare practitioners to make informed decisions when selecting appropriate imaging modalities and data types for improved patient care.

Thus, this study primarily aims to evaluate the diagnostic accuracy of AI in fracture detection using diverse imaging modalities and data types, reflecting AI’s growing role in healthcare. Additionally, we seek to synthesize current evidence on AI-based fracture detection, offering a concise overview and discerning the strengths and limitations of various data types, whether image, tabular, or combined.

Materials and methods

Identification and selection of studies

This systematic review, registered with PROSPERO (CRD42021240359), follows PRISMA guidelines (S1 PRISMA Checklist) [17]. We searched Medline (via PubMed), Web of Science, and IEEE. The last search was conducted on December 15, 2022, and we manually searched bibliographies, citations, and related articles of included studies. S1 Text lists each search term. Two independent reviewers (JJ and JD) assessed study eligibility, resolving disagreements through discussion or involving a third author (BL) if necessary.

Eligible studies predicted fracture outcomes using structured patient-level health data (electronic health records and cohort studies data) and image-related data (MRI, DXA, and X-ray). We excluded reviews, gray literature, non-human subject studies, studies without machine learning or deep learning models, fracture outcomes, AUC, accuracy, sensitivity, specificity, validation, and insufficient algorithm development details. We only considered studies published in English without time restrictions.

Data extraction

All three categories of data were considered: image-related, tabular, and both. Image-type studies used MRI, DXA, CT, or X-ray; tabular-type studies used structured electronic health records data; image and tabular studies used both data types. Two investigators (JJ and JD) independently evaluated study eligibility, extracting relevant data for articles meeting inclusion criteria. A structured data collection form was used to capture general study characteristics, population, data preprocessing, clinical outcomes, analytical methods, and results. A third author (BL) resolved discrepancies if necessary. We constructed the contingency table (true positive, true negative, false positive, and false negative) based on the provided information of sensitivity, specificity, positive predictive value, and negative predictive value for each study (S4 Table). If the study reported multiple sensitivity and specificity, we used the highest sensitivity and specificity.

Statistical analysis

Meta-analyses were performed using a random-effects model to calculate the pooled sensitivity and specificity based on logit transformation [18,19], using the Clopper-Pearson interval to calculate 95% confidence intervals for each study [20]. We used a unified hierarchical summary receiver operating characteristic curve (HSROC) to investigate the relationship between logit-transformed sensitivity and specificity. We calculated the diagnostic odds ratio and used inverse variance weighting for pooling with random effect models [21].

Sensitivity analysis

The logit transformation does not consider the correlation between sensitivity, specificity, and threshold effects; another model is desired to capture this missing part. Barendregt et al. [22] recommend using the Freeman-Tukey double arcsine transformation instead of the logit transformation. Hence, we used the Freeman-Tukey double arcsine transformation as a sensitivity analysis [22] for a random-effects model.

Subgroup analysis

Two subgroup analyses were conducted: 1) three data types (images, tabular, or images and tabular) and 2) different image modalities among image data used in AI. Statistical analysis was performed using R [23], with ‘meta’ [24] and ‘mada’ [25] packages. A p-value of < 0.05 was considered statistically significant.

Publication bias

We utilized the contour-enhanced funnel plot [26] to illustrate the assessment of publication bias for each fracture outcome and data type used. Each data point in the contour-enhanced funnel plot represents an individual study, and the plot incorporates contour lines that delineate expected areas of symmetry in the absence of bias. The plot provides insights into potential publication bias, with asymmetry suggesting a deviation from expected publication patterns. We employed the trim-and-fill method to address publication bias [22] further. This statistical approach helps adjust for the potential missing studies due to publication bias by imputing hypothetical “filled” studies and recalculating the effect size accordingly.

Risk of bias and applicability

Two reviewers (JJ and JD) independently evaluated the risk of bias in each study using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [27], assessing four domains: patient selection, index test, reference standard, and flow and timing. The risk of applicability was evaluated with the first three domains.

Results

Study selection and characteristics

Our search identified 1,128 studies, yielding 717 unique ones after removing duplicates (Fig 1). We screened titles and abstracts and selected 496 studies for full-text review based on our inclusion criteria. We then excluded 254 studies for lacking sensitivity and specificity information (149 studies), not having fracture-related outcomes (75 studies), not using ML models (28 studies), or being survey or review articles (2 studies). We further removed 176 studies because no contingency table could be calculated from the provided information. Ultimately, 66 studies were included in our systematic review and meta-analysis.

thumbnail
Fig 1. Flow chart of the literature selection in PubMed, Web of Science, and Institute of Electrical and Electronics Engineers (search conducted on December 15, 2022).

*IEEE: Institute of Electrical and Electronics Engineers.

https://doi.org/10.1371/journal.pdig.0000438.g001

The selected studies were published between 2007 and 2022, with 73% (48 studies) published in the last three years (Table 1). The studies were conducted in various countries, including Asian countries (26 studies) [6,9,11,2850], North American countries (19 studies) [14,34,36,5166], European countries (14 studies) [13,59,6778], Australia (1 study) [79] and Brazil (2 studies) [10,80] (Table 1). Four studies did not provide the country information [8184].

thumbnail
Table 1. Fracture detection of 66 selected studies using machine learning and deep learning models and general characteristics of the study.

https://doi.org/10.1371/journal.pdig.0000438.t001

Fracture identification was performed using imaging-related data in 54 studies, tabular data in nine studies, and imaging and tabular data in three. Of the 57 studies using imaging-related and combined data, 33 analyzed radiograph images [6,7,2831,3538,4042,45,4749,5257,59,61,62,6668,7274,78], 12 analyzed computed tomography (CT) images [8,9,39,43,50,63,65,69,75,8183], and the remaining studies analyzed other imaging modalities (S1 Table, and S2 Table). The most common fracture outcome was vertebral fracture (20 studies) [8,10,11,28,31,34,35,38,44,46,50,51,58,59,65,72,77,80,83,84], followed by hip [6,13,29,32,33,37,3943,48,53,62,64,66,68,79], and other fracture types (Table 1).

AI algorithms summary

Among the 54 studies that utilized imaging-related data, convolutional neural networks (CNN), a deep learning approach, emerged as the predominant choice, followed by instances where transfer learning was adopted. In some cases, the limited availability of labeled image data prompted the utilization of transfer learning [53,69], and certain studies incorporated pre-trained CNNs with non-fracture-related radiological images [6,28,85]. The prevailing preference was for fully connected artificial neural networks within the subset of nine studies involving tabular data. Logistic regression and ensemble learning models were commonly employed, including Random Forest, Gradient Boosting, and XGBoost. Among the three studies that harnessed both image and tabular data, a notable trend was the adoption of the support vector machine with various kernel models [57,68].

Handling imbalanced data and data augmentation

Imbalanced fracture outcomes were reported in 48 studies (S3 Table). Only 12 studies addressed the handling of imbalance outcomes during model development, using Synthetic Minority Over-sampling Technique (SMOTE) [86] or undersampling [35]. Data augmentation was frequently utilized in image studies, including horizontal and vertical rotation [45,50,58,67,69,72], adding Gaussian noise [67], random rescaling and flipping [30,53], mirroring, and lighting and contrast adjustments [56].

Hyperparameter optimization

Thirty-six studies reported the detailed process for optimizing hyperparameters in the final selected models (S3 Table). Beyaz et al. utilized genetic algorithms to identify the optimal hyperparameters for their CNN architecture [67]. Liu et al. explored the impact of varying the number of hidden neurons in the output layer [32]. Nissinen et al. [72] employed two approaches for hyperparameter searches: random search [87] and hyperband [88].

Data split and validation in an external data set

Fifty-one studies reported the split sample for model development (training) and validation (testing) (S3 Table). No universal rule of data separation was found. A different set of split samples was utilized, e.g., 80% training and 20% testing [10,28,47,57,71], 90% training and 10% testing [32,33,56,81], and 80% training, 10% validation, and 10% testing [40,41,65,69]. Twenty studies reported the cross-validation with 20-folds [66], 10-folds [8,14,33,34,39,45,50,53,57,64,72,76,80,81], 5-folds [13,28,32,38,44,46,48,67,74,78,79], and 7-folds [83]. Thirteen studies performed an out-of-sample external validation [6,7,2931,35,47,49,56,59,62,72,74]. Choi et al. [47] performed external tests using two types of distinct datasets: temporal data, which was obtained at a different period from the model development, and other geographically separated data, which was collected from a different center. Li et al. [35] utilized a dataset from another medical center that used a different plain radiographic technique.

Meta-analysis

We extracted 66 contingency tables for each selected study (S4 Table). The overall pooled sensitivity and specificity, calculated using logit transformation, were 91% (95% CI: 88, 93) and 90% (95% CI: 88, 92), respectively (Table 2). The pooled sensitivities for hip and vertebral fractures were found to be 92% (95% CI: 87–96) and 86% (95% CI: 82–89), respectively, while the pooled specificities for these fractures were 90% (95% CI: 85–93) and 86% (95% CI: 81–90), respectively (Table 2). The unified hierarchical summary receiver operating characteristic curve for different fracture types is shown in Fig 2. The area under the curve (AUC) was highest for femoral neck fractures at 0.98, followed by other fractures (0.97), multiple fractures (0.93), hip fractures (0.91), wrist (0.86), and vertebral (0.84).

thumbnail
Fig 2. The hierarchical summary receiver operating characteristic curve for different fracture types in the meta-analysis.

A: Hip (18 studies), B: Vertebral (20 studies), C: Wrist (3 studies), D: Femoral Neck (4 studies), E: Multiple (11 studies), and F: Others (10 studies).

https://doi.org/10.1371/journal.pdig.0000438.g002

thumbnail
Table 2. Pooled Sensitivities, Specificities, and Diagnostic Odds Ratio for 60 studies in different fractures outcome.

Studies with only one selected fracture outcome (cervical spine, hand, lumber spine, proximal humerus, supracondylar, and trabecular bone) were omitted.

https://doi.org/10.1371/journal.pdig.0000438.t002

Sensitivity analysis

Arcsine transformation yielded similar results with the pooled sensitivity at 89% (95% CI: 87, 91) and specificity at 88% (95% CI: 86, 91). Among data types, studies using only image data exhibited superior diagnostic performance with sensitivity and specificity at 91% (95% CI: 88, 93) and 89% (95% CI: 78, 91) using the arcsine transformation (Table 3). Studies employing radiographs displayed the highest sensitivity (92% [95% CI: 89, 95]) and specificity (90% [95% CI: 87, 93]) using the arcsine transformation (Table 4).

thumbnail
Table 3. Pooled Sensitivities, Specificities, and Diagnostic Odds Ratio for 66 studies in different data type used.

https://doi.org/10.1371/journal.pdig.0000438.t003

thumbnail
Table 4. Pooled sensitivities, specifications, and diagnostic odds ratios for 54 studies (including three from the tabular and image data used) in different image modalities.

Studies with only one selected image modality (Radiograph + CT + MRI, Radiograph + MRI, UGWSI) were omitted.

https://doi.org/10.1371/journal.pdig.0000438.t004

Subgroup analysis

Among data types, studies using only image data exhibited superior diagnostic performance with sensitivity and specificity at 92% (95% CI: 90, 94) and 91% (95% CI: 88, 93), respectively, when using logit transformation (Table 3). Studies employing radiographs displayed the highest sensitivity (94% [95% CI: 90, 96]) and specificity (92% [95% CI: 89, 94]) using logit transformation (Table 4). The AUC for radiograph studies (0.94) was higher than studies using radiograph and CT together (0.89) or MRI alone (0.88). The diagnostic odds ratio (DOR) was highest for hip fractures at 99.50 (95% CI: 39.37, 251.48) compared to vertebral fractures (38.26 [95% CI: 21.36, 68.51]) (Table 2). The AUC for image data studies (0.96) was higher than that for those using tabular and images together (0.83) or tabular data alone (0.81) (Fig 3).

thumbnail
Fig 3. Unified hierarchical summary receiver operating characteristic curve for different data types in the meta-analysis.

A: image (54 studies), B: tabular (9 studies), and C: image and tabular (3 studies).

https://doi.org/10.1371/journal.pdig.0000438.g003

Publication bias

The assessment of publication bias encompassed each fracture outcome and the utilization of distinct data types (S5 and S6 Tables, S1S3 Figs). The Contour-Enhanced Funnel Plot illustrated the study distribution, and its enhanced contour facilitated the identification of potential bias (S1S3 Figs). Notably, asymmetrical distribution was evident in the context of hip and vertebral fracture outcomes, and the studies used image data only (S1 Fig and S3 Fig). This asymmetry implies the presence of possible publication bias, particularly pronounced in studies with smaller sample sizes. However, the trim-and-fill method corrected this asymmetry, rendering the distribution symmetrical (S2 Fig and S3 Fig). After using the trim-and-fill method to adjust for publication bias, the diagnostic odds ratio (DOR) has revealed that the effect size remains statistically significant (S5 and S6 Tables).

Risk of bias and applicability

The assessment of bias and applicability for 66 studies revealed moderate to low concerns (Table 5 and Fig 4). Patient selection and reference standards were the primary concerns for bias and applicability. Many studies lacked the reporting of sample characteristics such as gender and age, limiting generalizability. Some studies did not report patient selection or reference standard computation methods [62,75,78]. Threshold adjustments in some studies might have led to overfitting, reducing the generalizability of the models [72]. Most studies exhibited applicability concerns and needed to be more easily generalizable to other populations. For example, one study [66] focused on patients visiting the emergency department for acute proximal femoral fracture, limiting generalizability to the general population. Another study included patients with existing vertebral fractures, reducing generalizability to the general population. Data preprocessing often involves the removal of occult fractures, with some studies excluding radiographic occult fractures requiring additional modalities for confirmation [53]. Other studies excluded images with uncertain, traumatic, or pathological fractures or those with insufficient quality or resolution [58]. A few studies did not provide specific locations for fracture types or specify which ones were included [12,70].

thumbnail
Table 5. The result of methodological quality for 66 included studies in the assessment of the risk of bias and applicability.

https://doi.org/10.1371/journal.pdig.0000438.t005

thumbnail
Fig 4. Summary of the Quality Assessment of Diagnostic Accuracy Studies for the risk of bias and applicability in the included 66 studies.

The risk of bias was measured in four domains: patient selection, index test, reference standard, and flow and timing. The risk of applicability was evaluated with three domains: patient selection, index test, and reference.

https://doi.org/10.1371/journal.pdig.0000438.g004

Discussion

Our systematic review and meta-analysis offer the most current and comprehensive evaluation of the diagnostic accuracy of Artificial Intelligence (AI) for predicting various osteoporotic fracture outcomes using various imaging modalities and data types. This study represents the first systematic review and quantitative meta-analysis of AI’s diagnostic accuracy and comparison using different data types across multiple fracture outcomes. Our analysis reveals four major findings. First, AI provides high classification accuracy for fracture detection when utilizing imaging data, with a pooled sensitivity of 92% (95% CI: 90, 94). Convolutional neural networks with transfer learning exhibit significantly high accuracy when using image data in classifying fractures. Second, our study comprehensively reviews diagnostic accuracy among different image modalities with AI. While all image modalities provide comparable results, AI with radiograph images yields the highest results with a pooled sensitivity of 94% (95% CI: 90, 96). Third, our sensitivity analysis, employing the arcsine transformation, which was complemented by the primary analysis utilizing the logit transformation, provides the robustness of our findings. Both methodologies yielded similar results regarding pooled sensitivity and specificity, which underscores the reliability and consistency of our findings. Fourth, significant flaws were observed in the study design and reporting of AI for real-world applicability. For example, only a few studies described the patient characteristics of data, and only half (n = 33) reported the hyperparameter selection process.

Our findings align with other systematic reviews and meta-analyses [15,16], showing that AI demonstrates considerably higher pooled sensitivity and specificity. However, inconsistent results have been observed when comparing different image modalities in fracture detection. External validation enables a more robust demonstration of clinical utility versus simple internal train/test cross-validation. Our study shows that only thirteen studies (20%) out of sixty-six performed external validation. The limitation of validating in an external dataset is the lack of availability of large, labeled datasets due to resistance to sharing data across institutions because of patient privacy issues and the necessity of experts for labeling the datasets. Although external validation enhances the robustness of AI systems, it could potentially attenuate their impact on the system. Consequently, it’s crucial to acknowledge that external validation might not always be advisable due to the potential impact of factors like sample size and the diversity of the training set. Two systematic reviews [89,90] provide valuable insights into the current limitations of AI studies. A broad discussion of possible solutions is necessary because methodological challenges, risk of bias, and applicability concerns can arise in AI during all stages of development, including data curation, model selection, implementation, and validation. Both reviews recommend that researchers follow standardized reporting guidelines to determine the risk of bias and improve methodological quality assessment.

Our study has limitations; the major one is that only a few studies that employed tabular data or combined tabular and image data are eligible. Second, we excluded non-English-language articles, which may have overlooked some studies published in a different language. Third, many of these included studies had study design flaws. They were classified as having great concern for bias and applicability, limiting the conclusions that could be drawn from the meta-analysis because studies with a high risk of bias and applicability overestimated algorithm performance.

This systematic review and meta-analysis have important implications for clinical practice. Given the high diagnostic performance of AI, these techniques could be integrated into existing fracture risk assessment tools to enhance the identification of patients at risk and facilitate early intervention. Healthcare professionals should be trained in interpreting and applying these methods in clinical practice.

This study observed superior prediction performance with single radiograph input data over multimodal imaging, which can be attributed to the radiographs’ consistent and standardized anatomical view, reducing noise and variability inherent in multimodal inputs [91]. Radiographs precisely capture fracture-relevant features, while added modalities like CT and MRI can diversify and possibly weaken these key features [92]. Multimodal inputs can also elevate overfitting risks, particularly with limited datasets [93]. Radiographs, being more accessible and cost-effective than CT or MRI, allow for larger, representative datasets enhancing model performance. The decision between single radiographs and multimodal inputs should be rooted in the research context, data availability, and prediction objectives. Despite the evident advantages of radiographs, specific scenarios may warrant multimodal integration for improved predictions. We also observed that solely relying on image data produced better AUC values than combining it with tabular data. Image data’s richness and direct relevance to fracture detection offer clear diagnostic advantages [94]. Convolutional neural networks (CNNs), identified in our study, are adept at processing this data, emphasizing subtle fracture-related visual nuances [95]. In contrast, tabular data could infuse noise and inconsistencies. Sole image data ensures focus on vital visual features and offers a more standardized data format than diverse tabular inputs.

Further research is needed to address the limitations identified in the included studies and to explore the performance of specific ML and DL algorithms. Researchers should provide more detailed information about their study populations and methods, including patient selection, fracture type location, and the reference standard used. Future studies should also investigate the impact of factors such as training dataset size, model architecture, and the inclusion of clinical and demographic variables on the diagnostic performance of AI. Future research will help develop more accurate and generalizable models for predicting osteoporotic fractures and inform evidence-based clinical practice. Several novel diagnostic meta-analysis methodologies have recently been introduced [9698]. Nevertheless, due to the limited sample sizes within selected studies focusing on fractures beyond vertebral and hip injuries and studies involving tabular and tabular and image data types, incorporating these methodologies into our present study was unfeasible. While we acknowledge their potential applicability, the current study’s unique characteristics led us to refrain from their implementation. We will implement these methodologies in our forthcoming investigations, particularly as more comprehensive studies become available. In aid of future researchers, we provide an array of crucial challenges and their potential resolutions pertinent to applying machine learning or deep learning for fracture diagnosis (S7 Table).

In conclusion, our meta-analysis highlights the high diagnostic accuracy of AI in various fracture outcomes. As AI demonstrates reliable results in fracture detection, it holds the potential to streamline fracture diagnosis in healthcare systems. However, transparent reporting of study methods and designs for AI development and validation is essential to ensure their real-world applicability. By addressing the current research landscape’s limitations and promoting standardized guidelines, we can facilitate the integration of AI technologies into clinical practice and enhance the prediction of osteoporotic fractures, ultimately leading to improved patient care.

Supporting information

S1 Text. The search term used for each engine: 1) PubMed, 2) Web of Science, and 3) IEEE.

https://doi.org/10.1371/journal.pdig.0000438.s002

(DOCX)

S1 Table. A characteristic of 57 selected studies for Image modality, Image Data Type, and Data Source.

https://doi.org/10.1371/journal.pdig.0000438.s003

(DOCX)

S2 Table. The data source of 9 selected studies used tabular data, and 3 studies (in bold) used both tabular and image data.

https://doi.org/10.1371/journal.pdig.0000438.s004

(DOCX)

S3 Table. A characteristic of 66 selected studies for the unbalanced outcome, a technique used for an unbalanced outcome, data preprocessing, hyperparameters optimization, and performance measurement used.

https://doi.org/10.1371/journal.pdig.0000438.s005

(DOCX)

S4 Table. A summary of the contingency table for 66 selected studies.

https://doi.org/10.1371/journal.pdig.0000438.s006

(DOCX)

S5 Table. Summary of Publication Bias Assessment across different fracture outcomes.

TF: Trim and Fill method, DOR: Diagnostic Odds Ratio, CI: Confidence Interval.

https://doi.org/10.1371/journal.pdig.0000438.s007

(DOCX)

S6 Table. Summary of Publication Bias Assessment across different data types.

TF: Trim and Fill method, DOR: Diagnostic Odds Ratio, CI: Confidence Interval.

https://doi.org/10.1371/journal.pdig.0000438.s008

(DOCX)

S7 Table. Overview of Key Challenges and Potential Resolutions in the Utilization of Machine Learning or Deep Learning for Fracture Diagnosis.

https://doi.org/10.1371/journal.pdig.0000438.s009

(DOCX)

S1 Fig. Contour-Enhanced Funnel Plot for Publication Bias Assessment across Different Fracture Outcomes.

https://doi.org/10.1371/journal.pdig.0000438.s010

(DOCX)

S2 Fig. Contour-Enhanced Funnel Plot for Publication Bias Assessment across Different Fracture Outcomes after Employing the Trim & Fill Method.

The open circle represents the “filled” studies from the Trim & Fill Method in each fracture outcome plot.

https://doi.org/10.1371/journal.pdig.0000438.s011

(DOCX)

S3 Fig. Contour-Enhanced Funnel Plot: Evaluating Publication Bias Across Various Data Types.

The top row illustrates the funnel plot encompassing all studies. The second row shows the Contour-Enhanced Funnel Plot for Publication Bias Assessment after employing the Trim & Fill Method. The open circle designates the studies “filled” through the Trim & Fill Method within each contour-enhanced funnel plot in the second row.

https://doi.org/10.1371/journal.pdig.0000438.s012

(DOCX)

Acknowledgments

This research was partially conducted under the affiliation of the Nevada Institute of Personalized Medicine, College of Sciences (QW, JJ, and JD), Department of Epidemiology and Biostatistics, School of Public Health (QW and JJ), Department of Mathematical Sciences, College of Sciences (BL), the University of Nevada, Las Vegas.

References

  1. 1. Court-Brown CM, Caesar B. Epidemiology of adult fractures: A review. Injury. 2006;37: 691–697. pmid:16814787
  2. 2. Wu A-M, Bisignano C, James SL, Abady GG, Abedi A, Abu-Gharbieh E, et al. Global, regional, and national burden of bone fractures in 204 countries and territories, 1990–2019: a systematic analysis from the Global Burden of Disease Study 2019. Lancet Healthy Longev. 2021;2: e580–e592. pmid:34723233
  3. 3. Pike C, Birnbaum HG, Schiller M, Sharma H, Burge R, Edgell ET. Direct and Indirect Costs of Non-Vertebral Fracture Patients with Osteoporosis in the US. PharmacoEconomics. 2010;28: 395–409. pmid:20402541
  4. 4. Borgström F, Karlsson L, Ortsäter G, Norton N, Halbout P, Cooper C, et al. Fragility fractures in Europe: burden, management and opportunities. Arch Osteoporos. 2020;15: 59. pmid:32306163
  5. 5. Williamson S, Landeiro F, McConnell T, Fulford-Smith L, Javaid MK, Judge A, et al. Costs of fragility hip fractures globally: a systematic review and meta-regression analysis. Osteoporos Int. 2017;28: 2791–2800. pmid:28748387
  6. 6. Cheng CT, Ho TY, Lee TY, Chang CC, Chou CC, Chen CC, et al. Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs. Eur Radiol. 2019;29: 5469–5477. pmid:30937588
  7. 7. Bae J, Yu S, Oh J, Kim TH, Chung JH, Byun H, et al. External Validation of Deep Learning Algorithm for Detecting and Visualizing Femoral Neck Fracture Including Displaced and Non-displaced Fracture on Plain X-ray. J Digit Imaging. 2021;34: 1099–1109. pmid:34379216
  8. 8. Burns JE, Yao J, Summers RM. Vertebral body compression fractures and bone density: Automated detection and classification on CT Images. Radiology. 2017;284: 788–797. pmid:28301777
  9. 9. Inoue T, Maki S, Furuya T, Mikami Y, Mizutani M, Takada I, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Sci Rep. 2022;12: 16549. pmid:36192521
  10. 10. Ramos J. S., de Aguiar E. J., Belizario I. V., Costa M. V. L., Maciel J. G., Cazzolato M. T., et al. Analysis of vertebrae without fracture on spine MRI to assess bone fragility: A Comparison of Traditional Machine Learning and Deep Learning. 2022; 78–83.
  11. 11. Yabu A, Hoshino M, Tabuchi H, Takahashi S, Masumoto H, Akada M, et al. Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J. 2021;000: 1–7. pmid:33722728
  12. 12. Almog YA, Rai A, Zhang P, Moulaison A, Powell R, Mishra A, et al. Deep Learning with Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation. J Med Internet Res. 2020;22. pmid:32956069
  13. 13. Kruse C, Eiken P, Vestergaard P. Machine Learning Principles Can Improve Hip Fracture Prediction. Calcif Tissue Int. 2017;100: 348–360. pmid:28197643
  14. 14. Wu Q, Nasoz F, Jung J, Bhattarai B, Han MV. Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men. Calcif Tissue Int. 2020; 1–9. pmid:32728911
  15. 15. Kuo Rachel Y L, Harrison Conrad, Curran Terry-Ann, Jones Benjamin, Freethy Alexander, Cussons David, et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022;304: 50–62. pmid:35348381
  16. 16. Yang S, Yin B, Cao W, Feng C, Fan G, He S. Diagnostic accuracy of deep learning in orthopaedic fractures: a systematic review and meta-analysis. Clin Radiol. 2020;75: 713.e17–713.e28. pmid:32591230
  17. 17. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71. pmid:33782057
  18. 18. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: A single indicator of test performance. J Clin Epidemiol. 2003;56: 1129–1135. pmid:14615004
  19. 19. Sterne JAC, Gavaghan D, Egger M. Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000;53: 1119–1129. pmid:11106885
  20. 20. Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26: 404.
  21. 21. Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58: 882–893. pmid:16085191
  22. 22. Barendregt JJ, Doi SA, Lee YY, Norman RE, Vos T. Meta-analysis of prevalence. J Epidemiol Community Health. 2013;67: 974–978. pmid:23963506
  23. 23. Team RC. R: A language and environment for statistical computing. R Found Stat Comput Vienna Austria. 2019;3. Available: https://www.r-project.org/
  24. 24. Schwarzer G. meta: An R Package for Meta-Analysis. R News. 2007. Available: http://cran.r-project.org/doc/Rnews/
  25. 25. Doebler P, Holling H. Meta-Analysis of Diagnostic Accuracy with mada. Compr R Arch Netw. 2012; 1–15.
  26. 26. Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol. 2008;61: 991–996. pmid:18538991
  27. 27. Whiting PF, Rutjes AWW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Ann Intern Med. 2011;155: 529–536. pmid:22007046
  28. 28. Chen HY, Hsu BWY, Yin YK, Lin FH, Yang TH, Yang RS, et al. Application of deep learning algorithm to detect and visualize vertebral fractures on plain frontal radiographs. PLoS ONE. 2021;16: 1–10. pmid:33507982
  29. 29. Cheng CT, Chen CC, Cheng FJ, Chen HW, Su YS, Yeh CN, et al. A human-algorithm integration system for hip fracture detection on plain radiography: System development and validation study. JMIR Med Inform. 2020;8: 1–13. pmid:33245279
  30. 30. Cheng CT, Wang Y, Chen HW, Hsiao PM, Yeh CN, Hsieh CH, et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat Commun. 2021;12. pmid:33594071
  31. 31. Chou PH, Jou TH, Wu HH, Yao YC, Lin HH, Chang MC, et al. Ground truth generalizability affects performance of the artificial intelligence model in automated vertebral fracture detection on plain lateral radiographs of the spine. Spine J. 2022;22: 511–523. pmid:34737066
  32. 32. Liu Q, Cui X, Chou YC, Abbod MF, Lin J, Shieh JS. Ensemble artificial neural networks applied to predict the key risk factors of hip bone fracture for elders. Biomed Signal Process Control. 2015;21: 146–156.
  33. 33. Tseng WJ, Hung LW, Shieh JS, Abbod MF, Lin J. Hip fracture risk assessment: Artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study. BMC Musculoskelet Disord. 2013;14. pmid:23855555
  34. 34. Yeh LR, Zhang Y, Chen JH, Liu YL, Wang AC, Yang JY, et al. A deep learning-based method for the diagnosis of vertebral fractures on spine MRI: retrospective training and validation of ResNet. Eur Spine J. 2022;31: 2022–2030. pmid:35089420
  35. 35. Li YC, Chen HH, Horng-Shing Lu H, Hondar Wu HT, Chang MC, Chou PH. Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists? Clin Orthop Relat Res. 2021;479: 1598–1612. pmid:33651768
  36. 36. Yoon AP, Lee Y-L, Kane RL, Kuo C-F, Lin C, Chung KC. Development and Validation of a Deep Learning Model Using Convolutional Neural Networks to Identify Scaphoid Fractures in Radiographs. JAMA Netw Open. 2021;4: e216096. pmid:33956133
  37. 37. Mawatari T, Hayashida Y, Katsuragawa S, Yoshimatsu Y, Hamamura T, Anai K, et al. The effect of deep convolutional neural networks on radiologists’ performance in the detection of hip fractures on digital pelvic radiographs. Eur J Radiol. 2020;130: 109188. pmid:32721827
  38. 38. Murata K, Endo K, Aihara T, Suzuki H, Sawaji Y, Matsuoka Y, et al. Artificial intelligence for the detection of vertebral fractures on plain spinal radiography. Sci Rep. 2020;10: 1–8. pmid:33208824
  39. 39. Nishiyama KK, Ito M, Harada A, Boyd SK. Classification of women with and without hip fracture based on quantitative computed tomography and finite element analysis. Osteoporos Int. 2014;25: 619–626. pmid:23948875
  40. 40. Sato Y, Takegami Y, Asamoto T, Ono Y, Hidetoshi T, Goto R. Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet Disord. 2021; 1–10.
  41. 41. Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo N. Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skeletal Radiol. 2019;48: 239–244. pmid:29955910
  42. 42. Yamada Y, Maki S, Kishida S, Nagai H, Arima J, Yamakawa N, et al. Automated classification of hip fractures using deep convolutional neural networks with orthopedic surgeon-level accuracy: ensemble decision-making with antero-posterior and lateral radiographs. Acta Orthop. 2020;91: 699–704. pmid:32783544
  43. 43. Yamamoto N, Rahman R, Yagi N, Hayashi K, Maruo A, Muratsu H, et al. An automated fracture detection from pelvic CT images with 3-D convolutional neural networks. 2020 Int Symp Community-Centric Syst CcS 2020. 2020; 3–8.
  44. 44. Yoda T, Maki S, Furuya T, Yokota H, Matsumoto K, Takaoka H, et al. Automated Differentiation Between Osteoporotic Vertebral Fracture and Malignant Vertebral Fracture on MRI Using a Deep Convolutional Neural Network. Spine. 2022;47: E347–E352. pmid:34919075
  45. 45. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89: 468–473. pmid:29577791
  46. 46. Chen W, Liu X, Li K, Luo Y, Bai S, Wu J, et al. A deep-learning model for identifying fresh vertebral compression fractures on digital radiography. Eur Radiol. 2022;32: 1496–1505. pmid:34553256
  47. 47. Choi JW, Cho YJ, Lee S, Lee J, Lee S, Choi YH, et al. Using a Dual-Input Convolutional Neural Network for Automated Detection of Pediatric Supracondylar Fracture on Conventional Radiography. Invest Radiol. 2020;55: 101–110. pmid:31725064
  48. 48. Liu P, Lu L, Chen Y, Huo T, Xue M, Wang H, et al. Artificial intelligence to detect the femoral intertrochanteric fracture: The arrival of the intelligent-medicine era. Front Bioeng Biotechnol. 2022;10. Available: https://www.frontiersin.org/articles/10.3389/fbioe.2022.927926 pmid:36147533
  49. 49. Mu L, Qu T, Dong D, Li X, Pei Y, Wang Y, et al. Fine-Tuned Deep Convolutional Networks for the Detection of Femoral Neck Fractures on Pelvic Radiographs: A Multicenter Dataset Validation. IEEE Access. 2021;9: 78495–78503.
  50. 50. Li Y, Zhang Y, Zhang E, Chen Y, Wang Q, Liu K, et al. Differential diagnosis of benign and malignant vertebral fracture on CT using deep learning. Eur Radiol. 2021. pmid:33993335
  51. 51. Derkatch S, Kirby C, Kimelman D, Jozani MJ, Michael Davidson J, Leslie WD. Identification of vertebral fractures by convolutional neural networks to predict nonvertebral and hip fractures: A Registry-based Cohort Study of Dual X-ray Absorptiometry. Radiology. 2019;293: 404–411. pmid:31526255
  52. 52. Guermazi A, Tannoury C, Kompel AJ, Murakami AM, Ducarouge A, Gillibert A, et al. Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence. Radiology. 2022;302: 627–636. pmid:34931859
  53. 53. Gupta V, Demirer M, Bigelow M, Yu SM, Yu JS, Prevedello LM, et al. Using Transfer Learning and Class Activation Maps Supporting Detection and Localization of Femoral Fractures on Anteroposterior Radiographs. Proc—Int Symp Biomed Imaging. 2020;2020-April: 1526–1529.
  54. 54. Hayashi D, Kompel AJ, Ventre J, Ducarouge A, Nguyen T, Regnard NE, et al. Automated detection of acute appendicular skeletal fractures in pediatric patients using deep learning. Skelet Radiol. 2022;51: 2129–2139. pmid:35522332
  55. 55. Kitamura G. Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. Eur J Radiol. 2020;130: 109139. pmid:32623269
  56. 56. Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, et al. Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A. 2018;115: 11591–11596. pmid:30348771
  57. 57. Mehta SD, Sebro R. Computer-Aided Detection of Incidental Lumbar Spine Fractures from Routine Dual-Energy X-Ray Absorptiometry (DEXA) Studies Using a Support Vector Machine (SVM) Classifier. J Digit Imaging. 2020;33: 204–210. pmid:31062114
  58. 58. Monchka BA, Kimelman D, Lix LM, Leslie WD. Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: The Manitoba Bone Mineral Density Registry. Bone. 2021;150: 116017. pmid:34020078
  59. 59. Monchka BA, Schousboe JT, Davidson MJ, Kimelman D, Hans D, Raina P, et al. Development of a manufacturer-independent convolutional neural network for the automated identification of vertebral compression fractures in vertebral fracture assessment images using active learning. Bone. 2022;161: 116427. pmid:35489707
  60. 60. Mutasa S, Varada S, Goel A, Wong TT, Rasiej MJ. Advanced Deep Learning Techniques Applied to Automated Femoral Neck Fracture Detection and Classification. J Digit Imaging. 2020;csvgrdgnfmb mfs 33: 1209–1217. pmid:32583277
  61. 61. Nguyen T, Maarek R, Hermann AL, Kammoun A, Marchi A, Khelifi-Touhami MR, et al. Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists. Pediatr Radiol. 2022;52: 2215–2226. pmid:36169667
  62. 62. Oakden-rayner L, Gale W, Bonham TA, Lungren MP, Carneiro G, Bradley AP, et al. Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study. The Lancet. 2022;7500: 4–8. pmid:35396184
  63. 63. JE S, Osler P, Paul AB, Kunst M. CT Cervical Spine Fracture Detection Using a Convolutional Neural Network. AJNR Am J Neuroradiol. 2021;42: 1341–1347. pmid:34255730
  64. 64. Su Y, Kwok TCY, Cummings SR, Yip BHK, Cawthon PM. Can Classification and Regression Tree Analysis Help Identify Clinically Meaningful Risk Groups for Hip Fracture Prediction in Older American Men (The MrOS Cohort Study)? JBMR Plus. 2019;3: 1–6. pmid:31687643
  65. 65. Tomita N, Cheung YY, Hassanpour S. Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans. Comput Biol Med. 2018;98: 8–15. pmid:29758455
  66. 66. Yu JS, Yu SM, Erdal BS, Demirer M, Gupta V, Bigelow M, et al. Detection and localisation of hip fractures on anteroposterior radiographs with artificial intelligence: proof of concept. Clin Radiol. 2020;75: 237.e1-237.e9. pmid:31787211
  67. 67. Beyaz S, Açici K, Sümer E. Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches. Jt Dis Relat Surg. 2020;31: 175–183. pmid:32584712
  68. 68. Galassi A, Martín-Guerrero JD, Villamor E, Monserrat C, Rupérez MJ. Risk Assessment of Hip Fracture Based on Machine Learning. Appl Bionics Biomech. 2020. pmid:33425008
  69. 69. Kim DH, MacKinnon T. Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks. Clin Radiol. 2018;73: 439–445. pmid:29269036
  70. 70. Lemineur G, Harba R, Kilic N, Ucan ON, Osman O, Benhamou L. Efficient estimation of osteoporosis using Artificial Neural Networks. IECON Proc Ind Electron Conf. 2007; 3039–3044.
  71. 71. Minonzio JG, Cataldo B, Olivares R, Ramiandrisoa D, Soto R, Crawford B, et al. Automatic classifying of patients with non-traumatic fractures based on ultrasonic guided wave spectrum image using a dynamic support vector machine. IEEE Access. 2020;8: 194752–194764.
  72. 72. Nissinen T, Suoranta S, Saavalainen T, Sund R, Hurskainen O. Detecting pathological features and predicting fracture risk from dual-energy X-ray absorptiometry images using deep learning. Bone Rep. 2021;14. pmid:33997147
  73. 73. Ozkaya E, Topal FE, Bulut T, Gursoy M, Ozuysal M, Karakaya Z. Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg. 2022;48: 585–592. pmid:32862314
  74. 74. Raisuddin AM, Vaattovaara E, Nevalainen M, Nikki M, Järvenpää E, Makkonen K, et al. Critical evaluation of deep neural networks for wrist fracture detection. Sci Rep. 2021;11: 6006. pmid:33727668
  75. 75. Regnard NE, Lanseur B, Ventre J, Ducarouge A, Clovis L, Lassalle L, et al. Assessment of performances of a deep learning algorithm for the detection of limbs and pelvic fractures, dislocations, focal bone lesions, and elbow effusions on trauma X-rays. Eur J Radiol. 2022;154: 110447. pmid:35921795
  76. 76. Rosenberg GS, Cina A, Schiró GR, Giorgi PD, Gueorguiev B, Alini M, et al. Artificial Intelligence Accurately Detects Traumatic Thoracolumbar Fractures on Sagittal Radiographs. Medicina (Mex). 2022;58: 998. pmid:35893113
  77. 77. Ulivieri FM, Rinaudo L, Piodi LP, Messina C, Sconfienza LM, Sardanelli F, et al. Bone strain index as a predictor of further vertebral fracture in osteoporotic women: An artificial intelligence-based analysis. PLoS ONE. 2021;16: 1–13. pmid:33556061
  78. 78. Üreten K, Sevinç HF, İğdeli U, Onay A, Maraş Y. Use of deep learning methods for hand fracture detection from plain hand radiographs. Ulus Travma Acil Cerrahi Derg. 2022;28: 196–201. pmid:35099027
  79. 79. Ho-Le TP, Center JR, Eisman JA, Nguyen TV, Nguyen HT. Prediction of hip fracture in post-menopausal women using artificial neural network approach. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS. 2017; 4207–4210. pmid:29060825
  80. 80. Del Lama RS, Candido RM, Chiari-Correia NS, Nogueira-Barbosa MH, de Azevedo-Marques PM, Tinós R. Computer-Aided Diagnosis of Vertebral Compression Fractures Using Convolutional Neural Networks and Radiomics. J Digit Imaging. 2022;35: 446–458. pmid:35132524
  81. 81. Korfiatis VC, Tassani S, Matsopoulos GK. A New Ensemble Classification System For Fracture Zone Prediction Using Imbalanced Micro-CT Bone Morphometrical Data. IEEE J Biomed Health Inform. 2018;22: 1189–1196. pmid:28692998
  82. 82. Raghavendra U, Bhat NS, Gudigar A, Acharya UR. Automated system for the detection of thoracolumbar fractures using a CNN architecture. Future Gener Comput Syst. 2018;85: 184–189.
  83. 83. Salehinejad H, Ho E, Lin H, Crivellaro P, Samorodova O, Arciniegas MT, et al. Deep Sequential Learning For Cervical Spine Fracture Detection On Computed Tomography Imaging. IEEE 18th Int Symp Biomed Imaging. 2021; 1911–1914.
  84. 84. Yuzhao W, Tian B, Tong L, Lang H. Osteoporotic Vertebral Fracture Classification in X-rays Based on a Multi-modal Semantic Consistency Network. J BIONIC Eng. 2022;19: 1816–1829.
  85. 85. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2016;2016-Decem: 2818–2826.
  86. 86. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16: 321–357.
  87. 87. Bergstra J, Bengio Y. Random Search For Hyper-Parameter Optimization. J Mach Learn Res. 2012;13: 281–305.
  88. 88. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J Mach Learn Res. 2018;18: 1–52.
  89. 89. Zhou Q, Chen Z, Cao Y, Peng S. Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. Npj Digit Med. 2021;4: 1–12. pmid:34711955
  90. 90. Navarro CLA, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ. 2021;375: n2281. pmid:34670780
  91. 91. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18: 500–510. pmid:29777175
  92. 92. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11: 91. pmid:32785796
  93. 93. Boehm KM, Khosravi P, Vanguri R, Gao J, Shah SP. Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer. 2022;22: 114–126. pmid:34663944
  94. 94. Krupinski EA. Current perspectives in medical image perception. Atten Percept Psychophys. 2010;72: 1205–1217. pmid:20601701
  95. 95. Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data. 2019;6: 60.
  96. 96. Preisser JS, Inan G, Powers JM, Chu H. A population-averaged approach to diagnostic test meta-analysis. Biom J. 2019;61: 126–137. pmid:30370548
  97. 97. Xiaoye Ma Chu YL, Chen Yong, Stijnen Theo, Haitao . Meta-Analysis of Diagnostic Tests. Handbook of Meta-Analysis. Chapman and Hall/CRC; 2020.
  98. 98. Liu Z, Al Amer FM, Xiao M, Xu C, Furuya-Kanamori L, Hong H, et al. The normality assumption on between-study random effects was questionable in a considerable number of Cochrane meta-analyses. BMC Med. 2023;21: 112. pmid:36978059