JMIR Preprints #35150: Generalisability of Deep Learning Models Trained on Standardised and Non-Standardised Images: Retrospective Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Generalisability of Deep Learning Models Trained on Standardised and Non-Standardised Images: Retrospective Comparative Study

Ibukun Oloruntoba;
Tine Vestergaard;
Toan D Nguyen;
Zhen Yu;
Maithili Sashindranath;
Brigid Betz-Stablein;
H. Peter Soyer;
Zongyuan Ge;
Victoria Mar

ABSTRACT

Background:

Convolutional neural networks (CNNs) are a type of artificial intelligence (AI) which show promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image datasets with varying image capture standardisation.

Objective:

The primary objective of our study was to use CNN models with the same architecture, that were trained on image sets acquired with either the same image capture device and technique (standardised) or with varied devices and capture techniques (non-standardised), and test variability in performance when classifying skin cancer images in different populations.

Methods:

Three CNNs with the same architecture were trained. CNN-NS was trained on 25,331 images taken from the International Skin Imaging Collaboration using different image capture devices (non- standardised). CNN-S was trained on 235,268 MoleMap images taken with the same capture device (standardised) and CNN-S2 was trained on a subset of 25,331 standardised MoleMap images (matched for number and classes of training images to CNN-NS). These three models were then tested on three external test sets; 569 Danish images, the publicly available ISIC 2020 dataset consisting of 33126 images and a UQ dataset of 422 images. Primary outcome measures were sensitivity, specificity and area under the curve of the receiver operating characteristic (AUROC). Tele-dermatology assessments available for the Danish dataset were used to determine model performance compared to tele-dermatologists.

Results:

When tested on the 569 Danish images, CNN-S achieved an AUROC of 0.861 (CI 0.830 – 0.889; P=.001) and CNN-S2 (standardised models) achieved an AUROC of 0.831 (CI 0.798 – 0.861; P=.009), with both outperforming CNN-NS (non-standardised model), which achieved an AUROC of 0.759 (CI 0.722 – 0.794; P=.001, P=.009) (Figure 3). When tested on two additional datasets (ISIC 2020 and UQ) CNN-S and CNN-S2 still outperformed the CNN-NS (P=0.00, P=0.00 and P=.076, P=.347). When the CNNs were matched to the mean sensitivity and specificity of the tele-dermatologists on the Danish dataset, the model’s resultant sensitivities and specificities were surpassed by the tele-dermatologists (Table 5). However, when compared to CNN-S, the differences were not statistically significant (P=.10, P=.053). Performance across all CNN models as well as tele-dermatologists was influenced by image quality.

Conclusions:

CNNs trained on standardised images had improved performance and therefore greater generalisability in skin cancer classification when applied to unseen datasets. This is an important consideration for future algorithm development, regulation and approval.

Citation

Please cite as:

Oloruntoba I, Vestergaard T, Nguyen TD, Yu Z, Sashindranath M, Betz-Stablein B, Soyer HP, Ge Z, Mar V

Assessing the Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists: Retrospective Comparative Study

JMIR Dermatol 2022;5(3):e35150

DOI: 10.2196/35150

PMID: 27739475

PMCID: 5064390

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Dermatology

Date Submitted: Nov 23, 2021

Date Accepted: Aug 3, 2022

Date Submitted to PubMed: Aug 26, 2023

Generalisability of Deep Learning Models Trained on Standardised and Non-Standardised Images: Retrospective Comparative Study

ABSTRACT

Citation

Copyright