Original article/Research, New developments & Artificial Intelligence
External validation of a commercially available deep learning algorithm for fracture detection in children

https://doi.org/10.1016/j.diii.2021.10.007Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Deep learning algorithms lack real-world external validation prior to clinical use.

  • The tested deep learning algorithm shows strong diagnostic performance in children.

  • Sensitivity of the tested algorithm is lower in children under 4 years.

Abstract

Purpose

The purpose of this study was to conduct an external validation of a fracture assessment deep learning algorithm (Rayvolve®) using digital radiographs from a real-life cohort of children presenting routinely to the emergency room.

Materials and methods

This retrospective study was conducted on 2634 radiography sets (5865 images) from 2549 children (1459 boys, 1090 girls; mean age, 8.5 ± 4.5 [SD] years; age range: 0–17 years) referred by the pediatric emergency room for trauma. For each set was recorded whether one or more fractures were found, the number of fractures, and their location found by the senior radiologists and the algorithm. Using the senior radiologist diagnosis as the standard of reference, the diagnostic performance of deep learning algorithm (Rayvolve®) was calculated via three different approaches: a detection approach (presence/absence of a fracture as a binary variable), an enumeration approach (exact number of fractures detected) and a localization approach (focusing on whether the detected fractures were correctly localized). Subgroup analyses were performed according to the presence of a cast or not, age category (0–4 vs. 5–18 years) and anatomical region.

Results

Regarding detection approach, the deep learning algorithm yielded 95.7% sensitivity (95% CI: 94.0–96.9), 91.2% specificity (95% CI: 89.8–92.5) and 92.6% accuracy (95% CI: 91.5–93.6). Regarding enumeration and localization approaches, the deep learning algorithm yielded 94.1% sensitivity (95% CI: 92.1–95.6), 88.8% specificity (95% CI: 87.3–90.2) and 90.4% accuracy (95% CI: 89.2–91.5) for both approaches. Regarding age-related subgroup analyses, the deep learning algorithm yielded greater sensitivity and negative predictive value in the 5–18-years age group than in the 0–4-years age group for the detection approach (P < 0.001 and P = 0.002) and for the enumeration and localization approaches (P = 0.012 and P = 0.028). The high negative predictive value was robust, persisting in all of the subgroup analyses, except for patients with casts (P = 0.001 for the detection approach and P < 0.001 for the enumeration and localization approaches).

Conclusion

The Rayvolve® deep learning algorithm is very reliable for detecting fractures in children, especially in those older than 4 years and without cast.

Keywords

Radiographs
Deep learning algorithm
Artificial intelligence
Fractures
Pediatric

Abbreviations

CI
Confidence interval
DICOM
Digital imaging and communications in medicine
FN
False negative
FP
False positive
LR-
Negative likelihood ratio
LR+
Positive likelihood ratio
NPV
Negative predictive value
PACS
Picture archiving and communication system
PPV
Positive predictive value
ROI
Region of interest
TN
True negative
TP
True positive

Cited by (0)

1

R. V. and C.A. contributed equally to this work and share last co-authorship.