Figures
Abstract
Artificial Intelligence (AI), encompassing Machine Learning and Deep Learning, has increasingly been applied to fracture detection using diverse imaging modalities and data types. This systematic review and meta-analysis aimed to assess the efficacy of AI in detecting fractures through various imaging modalities and data types (image, tabular, or both) and to synthesize the existing evidence related to AI-based fracture detection. Peer-reviewed studies developing and validating AI for fracture detection were identified through searches in multiple electronic databases without time limitations. A hierarchical meta-analysis model was used to calculate pooled sensitivity and specificity. A diagnostic accuracy quality assessment was performed to evaluate bias and applicability. Of the 66 eligible studies, 54 identified fractures using imaging-related data, nine using tabular data, and three using both. Vertebral fractures were the most common outcome (n = 20), followed by hip fractures (n = 18). Hip fractures exhibited the highest pooled sensitivity (92%; 95% CI: 87–96, p< 0.01) and specificity (90%; 95% CI: 85–93, p< 0.01). Pooled sensitivity and specificity using image data (92%; 95% CI: 90–94, p< 0.01; and 91%; 95% CI: 88–93, p < 0.01) were higher than those using tabular data (81%; 95% CI: 77–85, p< 0.01; and 83%; 95% CI: 76–88, p < 0.01), respectively. Radiographs demonstrated the highest pooled sensitivity (94%; 95% CI: 90–96, p < 0.01) and specificity (92%; 95% CI: 89–94, p< 0.01). Patient selection and reference standards were major concerns in assessing diagnostic accuracy for bias and applicability. AI displays high diagnostic accuracy for various fracture outcomes, indicating potential utility in healthcare systems for fracture diagnosis. However, enhanced transparency in reporting and adherence to standardized guidelines are necessary to improve the clinical applicability of AI.
Review Registration: PROSPERO (CRD42021240359).
Author summary
Artificial Intelligence (AI) is increasingly employed to detect fractures by using various imaging modalities and data types. Our search of Medline (via PubMed), Web of Science, and IEEE revealed numerous primary studies demonstrating AI’s superior performance in fracture detection. This systematic review and meta-analysis is the first to assess and compare the diagnostic accuracy of AI models across different imaging modalities and data types for various fracture outcomes. We found that AI models achieve high accuracy in fracture detection, particularly with radiograph images. However, we identified significant flaws in study design and reporting, limiting real-world applicability. Few studies provided patient characteristics, and only half reported the hyperparameter selection process. Our findings underscore the benefits of using AI models with radiographs for fracture detection, as they outperform other imaging modalities. Despite similar results across modalities, inadequate methodology and reporting in AI model evaluations call for improvement. Considering AI’s high diagnostic performance, integrating it into existing fracture risk assessment tools could enhance patient identification and enable early intervention.
Citation: Jung J, Dai J, Liu B, Wu Q (2024) Artificial intelligence in fracture detection with different image modalities and data types: A systematic review and meta-analysis. PLOS Digit Health 3(1): e0000438. https://doi.org/10.1371/journal.pdig.0000438
Editor: Martin G. Frasch, University of Washington, UNITED STATES
Received: May 3, 2023; Accepted: December 25, 2023; Published: January 30, 2024
Copyright: © 2024 Jung et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data generated or analyzed during the study are included in the published paper.
Funding: The research and analysis described in the current publication were supported by a grant (R21MD013681 to QW) from the National Institute on Minority Health and Health Disparities and a grant (R01AG080017 to QW) from the National Institute of Aging. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Bone fractures represent a significant public health concern globally [1], particularly for individuals with osteoporosis [2]. Fractures contribute to work absences, disability, reduced quality of life, health complications, and increased healthcare costs, affecting individuals, families, and societies [3,4]. A meta-analysis of 113 studies reported the pooled cost of hospital treatment for a hip fracture after 12 months as $10,075, with total health and social care costs amounting to $43,669 per hip fracture [5].
Artificial Intelligence (AI), encompassing Machine Learning (ML) and Deep Learning (DL), has been extensively employed for fracture outcome prediction due to technological advancements and accessibility. Various imaging modalities, including X-rays [6,7], computed tomography (CT) [8,9], and magnetic resonance imaging (MRI) [10,11], have been used in fracture diagnosis and detection. AI can also predict fractures using tabular data, such as electronic medical records (structured patient-level data). However, few studies [12–14] have applied AI with tabular data in fracture prediction despite its growing importance over the past decade. Recent systematic reviews and meta-analyses have reported high accuracy for AI in fracture detection and classification. Kuo et al. [15] summarized 42 studies with 115 contingency tables, finding pooled sensitivity of 92% (95% CI: 88, 94) and specificity of 91% (95% CI: 88, 93). Yang et al. [16] reviewed 14 studies on orthopedic fractures, reporting pooled sensitivity and specificity of DL models as 87% (95% CI: 78, 93) and 91% (95% CI: 85, 95), respectively.
However, existing systematic review and meta-analysis studies focused solely on image-based analyses, neglecting comprehensive examination of various imaging modalities and data types (image, tabular, or both). Despite the superior performance of AI for medical image analysis and using tabular data, a critical gap exists in the current literature concerning the optimal choice of image modalities and the choice between image, tabular, or combined data types. There is a lack of comprehensive guidance on the most effective selection of image modalities and data types for fracture diagnosis. This gap in knowledge underscores the need for systematic investigation to determine which image modality, and by extension, which data type, yields the highest diagnostic accuracy and clinical relevance in AL algorithms. Addressing this gap will not only optimize the design of AI-based diagnostic tools but also enable healthcare practitioners to make informed decisions when selecting appropriate imaging modalities and data types for improved patient care.
Thus, this study primarily aims to evaluate the diagnostic accuracy of AI in fracture detection using diverse imaging modalities and data types, reflecting AI’s growing role in healthcare. Additionally, we seek to synthesize current evidence on AI-based fracture detection, offering a concise overview and discerning the strengths and limitations of various data types, whether image, tabular, or combined.
Materials and methods
Identification and selection of studies
This systematic review, registered with PROSPERO (CRD42021240359), follows PRISMA guidelines (S1 PRISMA Checklist) [17]. We searched Medline (via PubMed), Web of Science, and IEEE. The last search was conducted on December 15, 2022, and we manually searched bibliographies, citations, and related articles of included studies. S1 Text lists each search term. Two independent reviewers (JJ and JD) assessed study eligibility, resolving disagreements through discussion or involving a third author (BL) if necessary.
Eligible studies predicted fracture outcomes using structured patient-level health data (electronic health records and cohort studies data) and image-related data (MRI, DXA, and X-ray). We excluded reviews, gray literature, non-human subject studies, studies without machine learning or deep learning models, fracture outcomes, AUC, accuracy, sensitivity, specificity, validation, and insufficient algorithm development details. We only considered studies published in English without time restrictions.
Data extraction
All three categories of data were considered: image-related, tabular, and both. Image-type studies used MRI, DXA, CT, or X-ray; tabular-type studies used structured electronic health records data; image and tabular studies used both data types. Two investigators (JJ and JD) independently evaluated study eligibility, extracting relevant data for articles meeting inclusion criteria. A structured data collection form was used to capture general study characteristics, population, data preprocessing, clinical outcomes, analytical methods, and results. A third author (BL) resolved discrepancies if necessary. We constructed the contingency table (true positive, true negative, false positive, and false negative) based on the provided information of sensitivity, specificity, positive predictive value, and negative predictive value for each study (S4 Table). If the study reported multiple sensitivity and specificity, we used the highest sensitivity and specificity.
Statistical analysis
Meta-analyses were performed using a random-effects model to calculate the pooled sensitivity and specificity based on logit transformation [18,19], using the Clopper-Pearson interval to calculate 95% confidence intervals for each study [20]. We used a unified hierarchical summary receiver operating characteristic curve (HSROC) to investigate the relationship between logit-transformed sensitivity and specificity. We calculated the diagnostic odds ratio and used inverse variance weighting for pooling with random effect models [21].
Sensitivity analysis
The logit transformation does not consider the correlation between sensitivity, specificity, and threshold effects; another model is desired to capture this missing part. Barendregt et al. [22] recommend using the Freeman-Tukey double arcsine transformation instead of the logit transformation. Hence, we used the Freeman-Tukey double arcsine transformation as a sensitivity analysis [22] for a random-effects model.
Subgroup analysis
Two subgroup analyses were conducted: 1) three data types (images, tabular, or images and tabular) and 2) different image modalities among image data used in AI. Statistical analysis was performed using R [23], with ‘meta’ [24] and ‘mada’ [25] packages. A p-value of < 0.05 was considered statistically significant.
Publication bias
We utilized the contour-enhanced funnel plot [26] to illustrate the assessment of publication bias for each fracture outcome and data type used. Each data point in the contour-enhanced funnel plot represents an individual study, and the plot incorporates contour lines that delineate expected areas of symmetry in the absence of bias. The plot provides insights into potential publication bias, with asymmetry suggesting a deviation from expected publication patterns. We employed the trim-and-fill method to address publication bias [22] further. This statistical approach helps adjust for the potential missing studies due to publication bias by imputing hypothetical “filled” studies and recalculating the effect size accordingly.
Risk of bias and applicability
Two reviewers (JJ and JD) independently evaluated the risk of bias in each study using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [27], assessing four domains: patient selection, index test, reference standard, and flow and timing. The risk of applicability was evaluated with the first three domains.
Results
Study selection and characteristics
Our search identified 1,128 studies, yielding 717 unique ones after removing duplicates (Fig 1). We screened titles and abstracts and selected 496 studies for full-text review based on our inclusion criteria. We then excluded 254 studies for lacking sensitivity and specificity information (149 studies), not having fracture-related outcomes (75 studies), not using ML models (28 studies), or being survey or review articles (2 studies). We further removed 176 studies because no contingency table could be calculated from the provided information. Ultimately, 66 studies were included in our systematic review and meta-analysis.
*IEEE: Institute of Electrical and Electronics Engineers.
The selected studies were published between 2007 and 2022, with 73% (48 studies) published in the last three years (Table 1). The studies were conducted in various countries, including Asian countries (26 studies) [6,9,11,28–50], North American countries (19 studies) [14,34,36,51–66], European countries (14 studies) [13,59,67–78], Australia (1 study) [79] and Brazil (2 studies) [10,80] (Table 1). Four studies did not provide the country information [81–84].
Fracture identification was performed using imaging-related data in 54 studies, tabular data in nine studies, and imaging and tabular data in three. Of the 57 studies using imaging-related and combined data, 33 analyzed radiograph images [6,7,28–31,35–38,40–42,45,47–49,52–57,59,61,62,66–68,72–74,78], 12 analyzed computed tomography (CT) images [8,9,39,43,50,63,65,69,75,81–83], and the remaining studies analyzed other imaging modalities (S1 Table, and S2 Table). The most common fracture outcome was vertebral fracture (20 studies) [8,10,11,28,31,34,35,38,44,46,50,51,58,59,65,72,77,80,83,84], followed by hip [6,13,29,32,33,37,39–43,48,53,62,64,66,68,79], and other fracture types (Table 1).
AI algorithms summary
Among the 54 studies that utilized imaging-related data, convolutional neural networks (CNN), a deep learning approach, emerged as the predominant choice, followed by instances where transfer learning was adopted. In some cases, the limited availability of labeled image data prompted the utilization of transfer learning [53,69], and certain studies incorporated pre-trained CNNs with non-fracture-related radiological images [6,28,85]. The prevailing preference was for fully connected artificial neural networks within the subset of nine studies involving tabular data. Logistic regression and ensemble learning models were commonly employed, including Random Forest, Gradient Boosting, and XGBoost. Among the three studies that harnessed both image and tabular data, a notable trend was the adoption of the support vector machine with various kernel models [57,68].
Handling imbalanced data and data augmentation
Imbalanced fracture outcomes were reported in 48 studies (S3 Table). Only 12 studies addressed the handling of imbalance outcomes during model development, using Synthetic Minority Over-sampling Technique (SMOTE) [86] or undersampling [35]. Data augmentation was frequently utilized in image studies, including horizontal and vertical rotation [45,50,58,67,69,72], adding Gaussian noise [67], random rescaling and flipping [30,53], mirroring, and lighting and contrast adjustments [56].
Hyperparameter optimization
Thirty-six studies reported the detailed process for optimizing hyperparameters in the final selected models (S3 Table). Beyaz et al. utilized genetic algorithms to identify the optimal hyperparameters for their CNN architecture [67]. Liu et al. explored the impact of varying the number of hidden neurons in the output layer [32]. Nissinen et al. [72] employed two approaches for hyperparameter searches: random search [87] and hyperband [88].
Data split and validation in an external data set
Fifty-one studies reported the split sample for model development (training) and validation (testing) (S3 Table). No universal rule of data separation was found. A different set of split samples was utilized, e.g., 80% training and 20% testing [10,28,47,57,71], 90% training and 10% testing [32,33,56,81], and 80% training, 10% validation, and 10% testing [40,41,65,69]. Twenty studies reported the cross-validation with 20-folds [66], 10-folds [8,14,33,34,39,45,50,53,57,64,72,76,80,81], 5-folds [13,28,32,38,44,46,48,67,74,78,79], and 7-folds [83]. Thirteen studies performed an out-of-sample external validation [6,7,29–31,35,47,49,56,59,62,72,74]. Choi et al. [47] performed external tests using two types of distinct datasets: temporal data, which was obtained at a different period from the model development, and other geographically separated data, which was collected from a different center. Li et al. [35] utilized a dataset from another medical center that used a different plain radiographic technique.
Meta-analysis
We extracted 66 contingency tables for each selected study (S4 Table). The overall pooled sensitivity and specificity, calculated using logit transformation, were 91% (95% CI: 88, 93) and 90% (95% CI: 88, 92), respectively (Table 2). The pooled sensitivities for hip and vertebral fractures were found to be 92% (95% CI: 87–96) and 86% (95% CI: 82–89), respectively, while the pooled specificities for these fractures were 90% (95% CI: 85–93) and 86% (95% CI: 81–90), respectively (Table 2). The unified hierarchical summary receiver operating characteristic curve for different fracture types is shown in Fig 2. The area under the curve (AUC) was highest for femoral neck fractures at 0.98, followed by other fractures (0.97), multiple fractures (0.93), hip fractures (0.91), wrist (0.86), and vertebral (0.84).
A: Hip (18 studies), B: Vertebral (20 studies), C: Wrist (3 studies), D: Femoral Neck (4 studies), E: Multiple (11 studies), and F: Others (10 studies).
Studies with only one selected fracture outcome (cervical spine, hand, lumber spine, proximal humerus, supracondylar, and trabecular bone) were omitted.
Sensitivity analysis
Arcsine transformation yielded similar results with the pooled sensitivity at 89% (95% CI: 87, 91) and specificity at 88% (95% CI: 86, 91). Among data types, studies using only image data exhibited superior diagnostic performance with sensitivity and specificity at 91% (95% CI: 88, 93) and 89% (95% CI: 78, 91) using the arcsine transformation (Table 3). Studies employing radiographs displayed the highest sensitivity (92% [95% CI: 89, 95]) and specificity (90% [95% CI: 87, 93]) using the arcsine transformation (Table 4).
Studies with only one selected image modality (Radiograph + CT + MRI, Radiograph + MRI, UGWSI) were omitted.
Subgroup analysis
Among data types, studies using only image data exhibited superior diagnostic performance with sensitivity and specificity at 92% (95% CI: 90, 94) and 91% (95% CI: 88, 93), respectively, when using logit transformation (Table 3). Studies employing radiographs displayed the highest sensitivity (94% [95% CI: 90, 96]) and specificity (92% [95% CI: 89, 94]) using logit transformation (Table 4). The AUC for radiograph studies (0.94) was higher than studies using radiograph and CT together (0.89) or MRI alone (0.88). The diagnostic odds ratio (DOR) was highest for hip fractures at 99.50 (95% CI: 39.37, 251.48) compared to vertebral fractures (38.26 [95% CI: 21.36, 68.51]) (Table 2). The AUC for image data studies (0.96) was higher than that for those using tabular and images together (0.83) or tabular data alone (0.81) (Fig 3).
A: image (54 studies), B: tabular (9 studies), and C: image and tabular (3 studies).
Publication bias
The assessment of publication bias encompassed each fracture outcome and the utilization of distinct data types (S5 and S6 Tables, S1–S3 Figs). The Contour-Enhanced Funnel Plot illustrated the study distribution, and its enhanced contour facilitated the identification of potential bias (S1—S3 Figs). Notably, asymmetrical distribution was evident in the context of hip and vertebral fracture outcomes, and the studies used image data only (S1 Fig and S3 Fig). This asymmetry implies the presence of possible publication bias, particularly pronounced in studies with smaller sample sizes. However, the trim-and-fill method corrected this asymmetry, rendering the distribution symmetrical (S2 Fig and S3 Fig). After using the trim-and-fill method to adjust for publication bias, the diagnostic odds ratio (DOR) has revealed that the effect size remains statistically significant (S5 and S6 Tables).
Risk of bias and applicability
The assessment of bias and applicability for 66 studies revealed moderate to low concerns (Table 5 and Fig 4). Patient selection and reference standards were the primary concerns for bias and applicability. Many studies lacked the reporting of sample characteristics such as gender and age, limiting generalizability. Some studies did not report patient selection or reference standard computation methods [62,75,78]. Threshold adjustments in some studies might have led to overfitting, reducing the generalizability of the models [72]. Most studies exhibited applicability concerns and needed to be more easily generalizable to other populations. For example, one study [66] focused on patients visiting the emergency department for acute proximal femoral fracture, limiting generalizability to the general population. Another study included patients with existing vertebral fractures, reducing generalizability to the general population. Data preprocessing often involves the removal of occult fractures, with some studies excluding radiographic occult fractures requiring additional modalities for confirmation [53]. Other studies excluded images with uncertain, traumatic, or pathological fractures or those with insufficient quality or resolution [58]. A few studies did not provide specific locations for fracture types or specify which ones were included [12,70].
The risk of bias was measured in four domains: patient selection, index test, reference standard, and flow and timing. The risk of applicability was evaluated with three domains: patient selection, index test, and reference.
Discussion
Our systematic review and meta-analysis offer the most current and comprehensive evaluation of the diagnostic accuracy of Artificial Intelligence (AI) for predicting various osteoporotic fracture outcomes using various imaging modalities and data types. This study represents the first systematic review and quantitative meta-analysis of AI’s diagnostic accuracy and comparison using different data types across multiple fracture outcomes. Our analysis reveals four major findings. First, AI provides high classification accuracy for fracture detection when utilizing imaging data, with a pooled sensitivity of 92% (95% CI: 90, 94). Convolutional neural networks with transfer learning exhibit significantly high accuracy when using image data in classifying fractures. Second, our study comprehensively reviews diagnostic accuracy among different image modalities with AI. While all image modalities provide comparable results, AI with radiograph images yields the highest results with a pooled sensitivity of 94% (95% CI: 90, 96). Third, our sensitivity analysis, employing the arcsine transformation, which was complemented by the primary analysis utilizing the logit transformation, provides the robustness of our findings. Both methodologies yielded similar results regarding pooled sensitivity and specificity, which underscores the reliability and consistency of our findings. Fourth, significant flaws were observed in the study design and reporting of AI for real-world applicability. For example, only a few studies described the patient characteristics of data, and only half (n = 33) reported the hyperparameter selection process.
Our findings align with other systematic reviews and meta-analyses [15,16], showing that AI demonstrates considerably higher pooled sensitivity and specificity. However, inconsistent results have been observed when comparing different image modalities in fracture detection. External validation enables a more robust demonstration of clinical utility versus simple internal train/test cross-validation. Our study shows that only thirteen studies (20%) out of sixty-six performed external validation. The limitation of validating in an external dataset is the lack of availability of large, labeled datasets due to resistance to sharing data across institutions because of patient privacy issues and the necessity of experts for labeling the datasets. Although external validation enhances the robustness of AI systems, it could potentially attenuate their impact on the system. Consequently, it’s crucial to acknowledge that external validation might not always be advisable due to the potential impact of factors like sample size and the diversity of the training set. Two systematic reviews [89,90] provide valuable insights into the current limitations of AI studies. A broad discussion of possible solutions is necessary because methodological challenges, risk of bias, and applicability concerns can arise in AI during all stages of development, including data curation, model selection, implementation, and validation. Both reviews recommend that researchers follow standardized reporting guidelines to determine the risk of bias and improve methodological quality assessment.
Our study has limitations; the major one is that only a few studies that employed tabular data or combined tabular and image data are eligible. Second, we excluded non-English-language articles, which may have overlooked some studies published in a different language. Third, many of these included studies had study design flaws. They were classified as having great concern for bias and applicability, limiting the conclusions that could be drawn from the meta-analysis because studies with a high risk of bias and applicability overestimated algorithm performance.
This systematic review and meta-analysis have important implications for clinical practice. Given the high diagnostic performance of AI, these techniques could be integrated into existing fracture risk assessment tools to enhance the identification of patients at risk and facilitate early intervention. Healthcare professionals should be trained in interpreting and applying these methods in clinical practice.
This study observed superior prediction performance with single radiograph input data over multimodal imaging, which can be attributed to the radiographs’ consistent and standardized anatomical view, reducing noise and variability inherent in multimodal inputs [91]. Radiographs precisely capture fracture-relevant features, while added modalities like CT and MRI can diversify and possibly weaken these key features [92]. Multimodal inputs can also elevate overfitting risks, particularly with limited datasets [93]. Radiographs, being more accessible and cost-effective than CT or MRI, allow for larger, representative datasets enhancing model performance. The decision between single radiographs and multimodal inputs should be rooted in the research context, data availability, and prediction objectives. Despite the evident advantages of radiographs, specific scenarios may warrant multimodal integration for improved predictions. We also observed that solely relying on image data produced better AUC values than combining it with tabular data. Image data’s richness and direct relevance to fracture detection offer clear diagnostic advantages [94]. Convolutional neural networks (CNNs), identified in our study, are adept at processing this data, emphasizing subtle fracture-related visual nuances [95]. In contrast, tabular data could infuse noise and inconsistencies. Sole image data ensures focus on vital visual features and offers a more standardized data format than diverse tabular inputs.
Further research is needed to address the limitations identified in the included studies and to explore the performance of specific ML and DL algorithms. Researchers should provide more detailed information about their study populations and methods, including patient selection, fracture type location, and the reference standard used. Future studies should also investigate the impact of factors such as training dataset size, model architecture, and the inclusion of clinical and demographic variables on the diagnostic performance of AI. Future research will help develop more accurate and generalizable models for predicting osteoporotic fractures and inform evidence-based clinical practice. Several novel diagnostic meta-analysis methodologies have recently been introduced [96–98]. Nevertheless, due to the limited sample sizes within selected studies focusing on fractures beyond vertebral and hip injuries and studies involving tabular and tabular and image data types, incorporating these methodologies into our present study was unfeasible. While we acknowledge their potential applicability, the current study’s unique characteristics led us to refrain from their implementation. We will implement these methodologies in our forthcoming investigations, particularly as more comprehensive studies become available. In aid of future researchers, we provide an array of crucial challenges and their potential resolutions pertinent to applying machine learning or deep learning for fracture diagnosis (S7 Table).
In conclusion, our meta-analysis highlights the high diagnostic accuracy of AI in various fracture outcomes. As AI demonstrates reliable results in fracture detection, it holds the potential to streamline fracture diagnosis in healthcare systems. However, transparent reporting of study methods and designs for AI development and validation is essential to ensure their real-world applicability. By addressing the current research landscape’s limitations and promoting standardized guidelines, we can facilitate the integration of AI technologies into clinical practice and enhance the prediction of osteoporotic fractures, ultimately leading to improved patient care.
Supporting information
S1 Text. The search term used for each engine: 1) PubMed, 2) Web of Science, and 3) IEEE.
https://doi.org/10.1371/journal.pdig.0000438.s002
(DOCX)
S1 Table. A characteristic of 57 selected studies for Image modality, Image Data Type, and Data Source.
https://doi.org/10.1371/journal.pdig.0000438.s003
(DOCX)
S2 Table. The data source of 9 selected studies used tabular data, and 3 studies (in bold) used both tabular and image data.
https://doi.org/10.1371/journal.pdig.0000438.s004
(DOCX)
S3 Table. A characteristic of 66 selected studies for the unbalanced outcome, a technique used for an unbalanced outcome, data preprocessing, hyperparameters optimization, and performance measurement used.
https://doi.org/10.1371/journal.pdig.0000438.s005
(DOCX)
S4 Table. A summary of the contingency table for 66 selected studies.
https://doi.org/10.1371/journal.pdig.0000438.s006
(DOCX)
S5 Table. Summary of Publication Bias Assessment across different fracture outcomes.
TF: Trim and Fill method, DOR: Diagnostic Odds Ratio, CI: Confidence Interval.
https://doi.org/10.1371/journal.pdig.0000438.s007
(DOCX)
S6 Table. Summary of Publication Bias Assessment across different data types.
TF: Trim and Fill method, DOR: Diagnostic Odds Ratio, CI: Confidence Interval.
https://doi.org/10.1371/journal.pdig.0000438.s008
(DOCX)
S7 Table. Overview of Key Challenges and Potential Resolutions in the Utilization of Machine Learning or Deep Learning for Fracture Diagnosis.
https://doi.org/10.1371/journal.pdig.0000438.s009
(DOCX)
S1 Fig. Contour-Enhanced Funnel Plot for Publication Bias Assessment across Different Fracture Outcomes.
https://doi.org/10.1371/journal.pdig.0000438.s010
(DOCX)
S2 Fig. Contour-Enhanced Funnel Plot for Publication Bias Assessment across Different Fracture Outcomes after Employing the Trim & Fill Method.
The open circle represents the “filled” studies from the Trim & Fill Method in each fracture outcome plot.
https://doi.org/10.1371/journal.pdig.0000438.s011
(DOCX)
S3 Fig. Contour-Enhanced Funnel Plot: Evaluating Publication Bias Across Various Data Types.
The top row illustrates the funnel plot encompassing all studies. The second row shows the Contour-Enhanced Funnel Plot for Publication Bias Assessment after employing the Trim & Fill Method. The open circle designates the studies “filled” through the Trim & Fill Method within each contour-enhanced funnel plot in the second row.
https://doi.org/10.1371/journal.pdig.0000438.s012
(DOCX)
Acknowledgments
This research was partially conducted under the affiliation of the Nevada Institute of Personalized Medicine, College of Sciences (QW, JJ, and JD), Department of Epidemiology and Biostatistics, School of Public Health (QW and JJ), Department of Mathematical Sciences, College of Sciences (BL), the University of Nevada, Las Vegas.
References
- 1. Court-Brown CM, Caesar B. Epidemiology of adult fractures: A review. Injury. 2006;37: 691–697. pmid:16814787
- 2. Wu A-M, Bisignano C, James SL, Abady GG, Abedi A, Abu-Gharbieh E, et al. Global, regional, and national burden of bone fractures in 204 countries and territories, 1990–2019: a systematic analysis from the Global Burden of Disease Study 2019. Lancet Healthy Longev. 2021;2: e580–e592. pmid:34723233
- 3. Pike C, Birnbaum HG, Schiller M, Sharma H, Burge R, Edgell ET. Direct and Indirect Costs of Non-Vertebral Fracture Patients with Osteoporosis in the US. PharmacoEconomics. 2010;28: 395–409. pmid:20402541
- 4. Borgström F, Karlsson L, Ortsäter G, Norton N, Halbout P, Cooper C, et al. Fragility fractures in Europe: burden, management and opportunities. Arch Osteoporos. 2020;15: 59. pmid:32306163
- 5. Williamson S, Landeiro F, McConnell T, Fulford-Smith L, Javaid MK, Judge A, et al. Costs of fragility hip fractures globally: a systematic review and meta-regression analysis. Osteoporos Int. 2017;28: 2791–2800. pmid:28748387
- 6. Cheng CT, Ho TY, Lee TY, Chang CC, Chou CC, Chen CC, et al. Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs. Eur Radiol. 2019;29: 5469–5477. pmid:30937588
- 7. Bae J, Yu S, Oh J, Kim TH, Chung JH, Byun H, et al. External Validation of Deep Learning Algorithm for Detecting and Visualizing Femoral Neck Fracture Including Displaced and Non-displaced Fracture on Plain X-ray. J Digit Imaging. 2021;34: 1099–1109. pmid:34379216
- 8. Burns JE, Yao J, Summers RM. Vertebral body compression fractures and bone density: Automated detection and classification on CT Images. Radiology. 2017;284: 788–797. pmid:28301777
- 9. Inoue T, Maki S, Furuya T, Mikami Y, Mizutani M, Takada I, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Sci Rep. 2022;12: 16549. pmid:36192521
- 10. Ramos J. S., de Aguiar E. J., Belizario I. V., Costa M. V. L., Maciel J. G., Cazzolato M. T., et al. Analysis of vertebrae without fracture on spine MRI to assess bone fragility: A Comparison of Traditional Machine Learning and Deep Learning. 2022; 78–83.
- 11. Yabu A, Hoshino M, Tabuchi H, Takahashi S, Masumoto H, Akada M, et al. Using artificial intelligence to diagnose fresh osteoporotic vertebral fractures on magnetic resonance images. Spine J. 2021;000: 1–7. pmid:33722728
- 12. Almog YA, Rai A, Zhang P, Moulaison A, Powell R, Mishra A, et al. Deep Learning with Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation. J Med Internet Res. 2020;22. pmid:32956069
- 13. Kruse C, Eiken P, Vestergaard P. Machine Learning Principles Can Improve Hip Fracture Prediction. Calcif Tissue Int. 2017;100: 348–360. pmid:28197643
- 14. Wu Q, Nasoz F, Jung J, Bhattarai B, Han MV. Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men. Calcif Tissue Int. 2020; 1–9. pmid:32728911
- 15. Kuo Rachel Y L, Harrison Conrad, Curran Terry-Ann, Jones Benjamin, Freethy Alexander, Cussons David, et al. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology. 2022;304: 50–62. pmid:35348381
- 16. Yang S, Yin B, Cao W, Feng C, Fan G, He S. Diagnostic accuracy of deep learning in orthopaedic fractures: a systematic review and meta-analysis. Clin Radiol. 2020;75: 713.e17–713.e28. pmid:32591230
- 17. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71. pmid:33782057
- 18. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: A single indicator of test performance. J Clin Epidemiol. 2003;56: 1129–1135. pmid:14615004
- 19. Sterne JAC, Gavaghan D, Egger M. Publication and related bias in meta-analysis: Power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000;53: 1119–1129. pmid:11106885
- 20. Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26: 404.
- 21. Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58: 882–893. pmid:16085191
- 22. Barendregt JJ, Doi SA, Lee YY, Norman RE, Vos T. Meta-analysis of prevalence. J Epidemiol Community Health. 2013;67: 974–978. pmid:23963506
- 23.
Team RC. R: A language and environment for statistical computing. R Found Stat Comput Vienna Austria. 2019;3. Available: https://www.r-project.org/
- 24.
Schwarzer G. meta: An R Package for Meta-Analysis. R News. 2007. Available: http://cran.r-project.org/doc/Rnews/
- 25. Doebler P, Holling H. Meta-Analysis of Diagnostic Accuracy with mada. Compr R Arch Netw. 2012; 1–15.
- 26. Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol. 2008;61: 991–996. pmid:18538991
- 27. Whiting PF, Rutjes AWW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Ann Intern Med. 2011;155: 529–536. pmid:22007046
- 28. Chen HY, Hsu BWY, Yin YK, Lin FH, Yang TH, Yang RS, et al. Application of deep learning algorithm to detect and visualize vertebral fractures on plain frontal radiographs. PLoS ONE. 2021;16: 1–10. pmid:33507982
- 29. Cheng CT, Chen CC, Cheng FJ, Chen HW, Su YS, Yeh CN, et al. A human-algorithm integration system for hip fracture detection on plain radiography: System development and validation study. JMIR Med Inform. 2020;8: 1–13. pmid:33245279
- 30. Cheng CT, Wang Y, Chen HW, Hsiao PM, Yeh CN, Hsieh CH, et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat Commun. 2021;12. pmid:33594071
- 31. Chou PH, Jou TH, Wu HH, Yao YC, Lin HH, Chang MC, et al. Ground truth generalizability affects performance of the artificial intelligence model in automated vertebral fracture detection on plain lateral radiographs of the spine. Spine J. 2022;22: 511–523. pmid:34737066
- 32. Liu Q, Cui X, Chou YC, Abbod MF, Lin J, Shieh JS. Ensemble artificial neural networks applied to predict the key risk factors of hip bone fracture for elders. Biomed Signal Process Control. 2015;21: 146–156.
- 33. Tseng WJ, Hung LW, Shieh JS, Abbod MF, Lin J. Hip fracture risk assessment: Artificial neural network outperforms conditional logistic regression in an age- and sex-matched case control study. BMC Musculoskelet Disord. 2013;14. pmid:23855555
- 34. Yeh LR, Zhang Y, Chen JH, Liu YL, Wang AC, Yang JY, et al. A deep learning-based method for the diagnosis of vertebral fractures on spine MRI: retrospective training and validation of ResNet. Eur Spine J. 2022;31: 2022–2030. pmid:35089420
- 35. Li YC, Chen HH, Horng-Shing Lu H, Hondar Wu HT, Chang MC, Chou PH. Can a Deep-learning Model for the Automated Detection of Vertebral Fractures Approach the Performance Level of Human Subspecialists? Clin Orthop Relat Res. 2021;479: 1598–1612. pmid:33651768
- 36. Yoon AP, Lee Y-L, Kane RL, Kuo C-F, Lin C, Chung KC. Development and Validation of a Deep Learning Model Using Convolutional Neural Networks to Identify Scaphoid Fractures in Radiographs. JAMA Netw Open. 2021;4: e216096. pmid:33956133
- 37. Mawatari T, Hayashida Y, Katsuragawa S, Yoshimatsu Y, Hamamura T, Anai K, et al. The effect of deep convolutional neural networks on radiologists’ performance in the detection of hip fractures on digital pelvic radiographs. Eur J Radiol. 2020;130: 109188. pmid:32721827
- 38. Murata K, Endo K, Aihara T, Suzuki H, Sawaji Y, Matsuoka Y, et al. Artificial intelligence for the detection of vertebral fractures on plain spinal radiography. Sci Rep. 2020;10: 1–8. pmid:33208824
- 39. Nishiyama KK, Ito M, Harada A, Boyd SK. Classification of women with and without hip fracture based on quantitative computed tomography and finite element analysis. Osteoporos Int. 2014;25: 619–626. pmid:23948875
- 40. Sato Y, Takegami Y, Asamoto T, Ono Y, Hidetoshi T, Goto R. Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet Disord. 2021; 1–10.
- 41. Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo N. Detecting intertrochanteric hip fractures with orthopedist-level accuracy using a deep convolutional neural network. Skeletal Radiol. 2019;48: 239–244. pmid:29955910
- 42. Yamada Y, Maki S, Kishida S, Nagai H, Arima J, Yamakawa N, et al. Automated classification of hip fractures using deep convolutional neural networks with orthopedic surgeon-level accuracy: ensemble decision-making with antero-posterior and lateral radiographs. Acta Orthop. 2020;91: 699–704. pmid:32783544
- 43. Yamamoto N, Rahman R, Yagi N, Hayashi K, Maruo A, Muratsu H, et al. An automated fracture detection from pelvic CT images with 3-D convolutional neural networks. 2020 Int Symp Community-Centric Syst CcS 2020. 2020; 3–8.
- 44. Yoda T, Maki S, Furuya T, Yokota H, Matsumoto K, Takaoka H, et al. Automated Differentiation Between Osteoporotic Vertebral Fracture and Malignant Vertebral Fracture on MRI Using a Deep Convolutional Neural Network. Spine. 2022;47: E347–E352. pmid:34919075
- 45. Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89: 468–473. pmid:29577791
- 46. Chen W, Liu X, Li K, Luo Y, Bai S, Wu J, et al. A deep-learning model for identifying fresh vertebral compression fractures on digital radiography. Eur Radiol. 2022;32: 1496–1505. pmid:34553256
- 47. Choi JW, Cho YJ, Lee S, Lee J, Lee S, Choi YH, et al. Using a Dual-Input Convolutional Neural Network for Automated Detection of Pediatric Supracondylar Fracture on Conventional Radiography. Invest Radiol. 2020;55: 101–110. pmid:31725064
- 48. Liu P, Lu L, Chen Y, Huo T, Xue M, Wang H, et al. Artificial intelligence to detect the femoral intertrochanteric fracture: The arrival of the intelligent-medicine era. Front Bioeng Biotechnol. 2022;10. Available: https://www.frontiersin.org/articles/10.3389/fbioe.2022.927926 pmid:36147533
- 49. Mu L, Qu T, Dong D, Li X, Pei Y, Wang Y, et al. Fine-Tuned Deep Convolutional Networks for the Detection of Femoral Neck Fractures on Pelvic Radiographs: A Multicenter Dataset Validation. IEEE Access. 2021;9: 78495–78503.
- 50. Li Y, Zhang Y, Zhang E, Chen Y, Wang Q, Liu K, et al. Differential diagnosis of benign and malignant vertebral fracture on CT using deep learning. Eur Radiol. 2021. pmid:33993335
- 51. Derkatch S, Kirby C, Kimelman D, Jozani MJ, Michael Davidson J, Leslie WD. Identification of vertebral fractures by convolutional neural networks to predict nonvertebral and hip fractures: A Registry-based Cohort Study of Dual X-ray Absorptiometry. Radiology. 2019;293: 404–411. pmid:31526255
- 52. Guermazi A, Tannoury C, Kompel AJ, Murakami AM, Ducarouge A, Gillibert A, et al. Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence. Radiology. 2022;302: 627–636. pmid:34931859
- 53. Gupta V, Demirer M, Bigelow M, Yu SM, Yu JS, Prevedello LM, et al. Using Transfer Learning and Class Activation Maps Supporting Detection and Localization of Femoral Fractures on Anteroposterior Radiographs. Proc—Int Symp Biomed Imaging. 2020;2020-April: 1526–1529.
- 54. Hayashi D, Kompel AJ, Ventre J, Ducarouge A, Nguyen T, Regnard NE, et al. Automated detection of acute appendicular skeletal fractures in pediatric patients using deep learning. Skelet Radiol. 2022;51: 2129–2139. pmid:35522332
- 55. Kitamura G. Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. Eur J Radiol. 2020;130: 109139. pmid:32623269
- 56. Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, et al. Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A. 2018;115: 11591–11596. pmid:30348771
- 57. Mehta SD, Sebro R. Computer-Aided Detection of Incidental Lumbar Spine Fractures from Routine Dual-Energy X-Ray Absorptiometry (DEXA) Studies Using a Support Vector Machine (SVM) Classifier. J Digit Imaging. 2020;33: 204–210. pmid:31062114
- 58. Monchka BA, Kimelman D, Lix LM, Leslie WD. Feasibility of a generalized convolutional neural network for automated identification of vertebral compression fractures: The Manitoba Bone Mineral Density Registry. Bone. 2021;150: 116017. pmid:34020078
- 59. Monchka BA, Schousboe JT, Davidson MJ, Kimelman D, Hans D, Raina P, et al. Development of a manufacturer-independent convolutional neural network for the automated identification of vertebral compression fractures in vertebral fracture assessment images using active learning. Bone. 2022;161: 116427. pmid:35489707
- 60. Mutasa S, Varada S, Goel A, Wong TT, Rasiej MJ. Advanced Deep Learning Techniques Applied to Automated Femoral Neck Fracture Detection and Classification. J Digit Imaging. 2020;csvgrdgnfmb mfs 33: 1209–1217. pmid:32583277
- 61. Nguyen T, Maarek R, Hermann AL, Kammoun A, Marchi A, Khelifi-Touhami MR, et al. Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists. Pediatr Radiol. 2022;52: 2215–2226. pmid:36169667
- 62. Oakden-rayner L, Gale W, Bonham TA, Lungren MP, Carneiro G, Bradley AP, et al. Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study. The Lancet. 2022;7500: 4–8. pmid:35396184
- 63. JE S, Osler P, Paul AB, Kunst M. CT Cervical Spine Fracture Detection Using a Convolutional Neural Network. AJNR Am J Neuroradiol. 2021;42: 1341–1347. pmid:34255730
- 64. Su Y, Kwok TCY, Cummings SR, Yip BHK, Cawthon PM. Can Classification and Regression Tree Analysis Help Identify Clinically Meaningful Risk Groups for Hip Fracture Prediction in Older American Men (The MrOS Cohort Study)? JBMR Plus. 2019;3: 1–6. pmid:31687643
- 65. Tomita N, Cheung YY, Hassanpour S. Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans. Comput Biol Med. 2018;98: 8–15. pmid:29758455
- 66. Yu JS, Yu SM, Erdal BS, Demirer M, Gupta V, Bigelow M, et al. Detection and localisation of hip fractures on anteroposterior radiographs with artificial intelligence: proof of concept. Clin Radiol. 2020;75: 237.e1-237.e9. pmid:31787211
- 67. Beyaz S, Açici K, Sümer E. Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches. Jt Dis Relat Surg. 2020;31: 175–183. pmid:32584712
- 68. Galassi A, Martín-Guerrero JD, Villamor E, Monserrat C, Rupérez MJ. Risk Assessment of Hip Fracture Based on Machine Learning. Appl Bionics Biomech. 2020. pmid:33425008
- 69. Kim DH, MacKinnon T. Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks. Clin Radiol. 2018;73: 439–445. pmid:29269036
- 70. Lemineur G, Harba R, Kilic N, Ucan ON, Osman O, Benhamou L. Efficient estimation of osteoporosis using Artificial Neural Networks. IECON Proc Ind Electron Conf. 2007; 3039–3044.
- 71. Minonzio JG, Cataldo B, Olivares R, Ramiandrisoa D, Soto R, Crawford B, et al. Automatic classifying of patients with non-traumatic fractures based on ultrasonic guided wave spectrum image using a dynamic support vector machine. IEEE Access. 2020;8: 194752–194764.
- 72. Nissinen T, Suoranta S, Saavalainen T, Sund R, Hurskainen O. Detecting pathological features and predicting fracture risk from dual-energy X-ray absorptiometry images using deep learning. Bone Rep. 2021;14. pmid:33997147
- 73. Ozkaya E, Topal FE, Bulut T, Gursoy M, Ozuysal M, Karakaya Z. Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg. 2022;48: 585–592. pmid:32862314
- 74. Raisuddin AM, Vaattovaara E, Nevalainen M, Nikki M, Järvenpää E, Makkonen K, et al. Critical evaluation of deep neural networks for wrist fracture detection. Sci Rep. 2021;11: 6006. pmid:33727668
- 75. Regnard NE, Lanseur B, Ventre J, Ducarouge A, Clovis L, Lassalle L, et al. Assessment of performances of a deep learning algorithm for the detection of limbs and pelvic fractures, dislocations, focal bone lesions, and elbow effusions on trauma X-rays. Eur J Radiol. 2022;154: 110447. pmid:35921795
- 76. Rosenberg GS, Cina A, Schiró GR, Giorgi PD, Gueorguiev B, Alini M, et al. Artificial Intelligence Accurately Detects Traumatic Thoracolumbar Fractures on Sagittal Radiographs. Medicina (Mex). 2022;58: 998. pmid:35893113
- 77. Ulivieri FM, Rinaudo L, Piodi LP, Messina C, Sconfienza LM, Sardanelli F, et al. Bone strain index as a predictor of further vertebral fracture in osteoporotic women: An artificial intelligence-based analysis. PLoS ONE. 2021;16: 1–13. pmid:33556061
- 78. Üreten K, Sevinç HF, İğdeli U, Onay A, Maraş Y. Use of deep learning methods for hand fracture detection from plain hand radiographs. Ulus Travma Acil Cerrahi Derg. 2022;28: 196–201. pmid:35099027
- 79. Ho-Le TP, Center JR, Eisman JA, Nguyen TV, Nguyen HT. Prediction of hip fracture in post-menopausal women using artificial neural network approach. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS. 2017; 4207–4210. pmid:29060825
- 80. Del Lama RS, Candido RM, Chiari-Correia NS, Nogueira-Barbosa MH, de Azevedo-Marques PM, Tinós R. Computer-Aided Diagnosis of Vertebral Compression Fractures Using Convolutional Neural Networks and Radiomics. J Digit Imaging. 2022;35: 446–458. pmid:35132524
- 81. Korfiatis VC, Tassani S, Matsopoulos GK. A New Ensemble Classification System For Fracture Zone Prediction Using Imbalanced Micro-CT Bone Morphometrical Data. IEEE J Biomed Health Inform. 2018;22: 1189–1196. pmid:28692998
- 82. Raghavendra U, Bhat NS, Gudigar A, Acharya UR. Automated system for the detection of thoracolumbar fractures using a CNN architecture. Future Gener Comput Syst. 2018;85: 184–189.
- 83. Salehinejad H, Ho E, Lin H, Crivellaro P, Samorodova O, Arciniegas MT, et al. Deep Sequential Learning For Cervical Spine Fracture Detection On Computed Tomography Imaging. IEEE 18th Int Symp Biomed Imaging. 2021; 1911–1914.
- 84. Yuzhao W, Tian B, Tong L, Lang H. Osteoporotic Vertebral Fracture Classification in X-rays Based on a Multi-modal Semantic Consistency Network. J BIONIC Eng. 2022;19: 1816–1829.
- 85. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2016;2016-Decem: 2818–2826.
- 86. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16: 321–357.
- 87. Bergstra J, Bengio Y. Random Search For Hyper-Parameter Optimization. J Mach Learn Res. 2012;13: 281–305.
- 88. Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J Mach Learn Res. 2018;18: 1–52.
- 89. Zhou Q, Chen Z, Cao Y, Peng S. Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. Npj Digit Med. 2021;4: 1–12. pmid:34711955
- 90. Navarro CLA, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ. 2021;375: n2281. pmid:34670780
- 91. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18: 500–510. pmid:29777175
- 92. van Timmeren JE, Cester D, Tanadini-Lang S, Alkadhi H, Baessler B. Radiomics in medical imaging—“how-to” guide and critical reflection. Insights Imaging. 2020;11: 91. pmid:32785796
- 93. Boehm KM, Khosravi P, Vanguri R, Gao J, Shah SP. Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer. 2022;22: 114–126. pmid:34663944
- 94. Krupinski EA. Current perspectives in medical image perception. Atten Percept Psychophys. 2010;72: 1205–1217. pmid:20601701
- 95. Shorten C, Khoshgoftaar TM. A survey on Image Data Augmentation for Deep Learning. J Big Data. 2019;6: 60.
- 96. Preisser JS, Inan G, Powers JM, Chu H. A population-averaged approach to diagnostic test meta-analysis. Biom J. 2019;61: 126–137. pmid:30370548
- 97.
Xiaoye Ma Chu YL, Chen Yong, Stijnen Theo, Haitao . Meta-Analysis of Diagnostic Tests. Handbook of Meta-Analysis. Chapman and Hall/CRC; 2020.
- 98. Liu Z, Al Amer FM, Xiao M, Xu C, Furuya-Kanamori L, Hong H, et al. The normality assumption on between-study random effects was questionable in a considerable number of Cochrane meta-analyses. BMC Med. 2023;21: 112. pmid:36978059