Thoracic: Lung Cancer
Predicting benign, preinvasive, and invasive lung nodules on computed tomography scans using machine learning

Accepted for the 100th Annual Meeting of The American Association for Thoracic Surgery.
https://doi.org/10.1016/j.jtcvs.2021.02.010Get rights and content
Under a Creative Commons license
open access

Abstract

Objective

The study objective was to investigate if machine learning algorithms can predict whether a lung nodule is benign, adenocarcinoma, or its preinvasive subtype from computed tomography images alone.

Methods

A dataset of chest computed tomography scans containing lung nodules was collected with their pathologic diagnosis from several sources. The dataset was split randomly into training (70%), internal validation (15%), and independent test sets (15%) at the patient level. Two machine learning algorithms were developed, trained, and validated. The first algorithm used the support vector machine model, and the second used deep learning technology: a convolutional neural network. Receiver operating characteristic analysis was used to evaluate the performance of the classification on the test dataset.

Results

The support vector machine/convolutional neural network–based models classified nodules into 6 categories resulting in an area under the curve of 0.59/0.65 when differentiating atypical adenomatous hyperplasia versus adenocarcinoma in situ, 0.87/0.86 with minimally invasive adenocarcinoma versus invasive adenocarcinoma, 0.76/0.72 atypical adenomatous hyperplasia + adenocarcinoma in situ versus minimally invasive adenocarcinoma, 0.89/0.87 atypical adenomatous hyperplasia + adenocarcinoma in situ versus minimally invasive adenocarcinoma + invasive adenocarcinoma, and 0.93/0.92 atypical adenomatous hyperplasia + adenocarcinoma in situ + minimally invasive adenocarcinoma versus invasive adenocarcinoma. Classifying benign versus atypical adenomatous hyperplasia + adenocarcinoma in situ + minimally invasive adenocarcinoma versus invasive adenocarcinoma resulted in a micro-average area under the curve of 0.93/0.94 for the support vector machine/convolutional neural network models, respectively. The convolutional neural network–based methods had higher sensitivities than the support vector machine-based methods but lower specificities and accuracies.

Conclusions

The machine learning algorithms demonstrated reasonable performance in differentiating benign versus preinvasive versus invasive adenocarcinoma from computed tomography images alone. However, the prediction accuracy varies across its subtypes. This holds the potential for improved diagnostic capabilities with less-invasive means.

Graphical abstract

Machine learning algorithms can differentiate benign versus preinvasive versus IA from CT images alone. A dataset of chest CT scans containing lung nodules was collected, along with their pathologic diagnosis from several sources. Two different machine learning algorithms were developed, one using the SVM model and one using deep learning technology, a CNN. These 2 algorithms were independently trained and validated on 2516 lung nodules from the training dataset. They were then tested on a separate independent set of CT scans containing 640 nodules with ROC analysis used to evaluate their performance. They were able to discriminate lung nodules as benign, IA, or preinvasive subtypes. These algorithms hold potential for improved diagnostic capabilities with less-invasive means.

  1. Download : Download high-res image (244KB)
  2. Download : Download full-size image

Key Words

classification
computed tomography
lung adenocarcinoma
pathological subtype

Abbreviations and Acronyms

AAH
atypical adenomatous hyperplasia
AIS
adenocarcinoma in situ
AUC
area under the curve
CNN
convolutional neural network
CT
computed tomography
HU
Hounsfield Units
IA
invasive adenocarcinoma
IASLC
International Association for the Study of Lung Cancer
MIA
minimally invasive adenocarcinoma
NLST
National Lung Screening Trial
ROC
receiver operating characteristic
SVM
support vector machine
UPMC
University of Pittsburgh Medical Center
WHO
World Health Organization

Cited by (0)

Institutional Review Board approval: Ethics Committee at the Affiliated Zhongshan Hospital of Dalian University (Study #2019136). Ethics Committee at the Fourth Affiliated Hospital of Hebei Medical University (Study #2020KY082). University of Pittsburgh IRB (Study #20010242) 02/27/2020.