Automated detection of pulmonary nodules in PET / CT images : Ensemble false-positive reduction using a convolutional neural network technique

PURPOSE
Automated detection of solitary pulmonary nodules using positron emission tomography (PET) and computed tomography (CT) images shows good sensitivity; however, it is difficult to detect nodules in contact with normal organs, and additional efforts are needed so that the number of false positives (FPs) can be further reduced. In this paper, the authors propose an improved FP-reduction method for the detection of pulmonary nodules in PET/CT images by means of convolutional neural networks (CNNs).


METHODS
The overall scheme detects pulmonary nodules using both CT and PET images. In the CT images, a massive region is first detected using an active contour filter, which is a type of contrast enhancement filter that has a deformable kernel shape. Subsequently, high-uptake regions detected by the PET images are merged with the regions detected by the CT images. FP candidates are eliminated using an ensemble method; it consists of two feature extractions, one by shape/metabolic feature analysis and the other by a CNN, followed by a two-step classifier, one step being rule based and the other being based on support vector machines.


RESULTS
The authors evaluated the detection performance using 104 PET/CT images collected by a cancer-screening program. The sensitivity in detecting candidates at an initial stage was 97.2%, with 72.8 FPs/case. After performing the proposed FP-reduction method, the sensitivity of detection was 90.1%, with 4.9 FPs/case; the proposed method eliminated approximately half the FPs existing in the previous study.


CONCLUSIONS
An improved FP-reduction scheme using CNN technique has been developed for the detection of pulmonary nodules in PET/CT images. The authors' ensemble FP-reduction method eliminated 93% of the FPs; their proposed method using CNN technique eliminates approximately half the FPs existing in the previous study. These results indicate that their method may be useful in the computer-aided detection of pulmonary nodules using PET/CT images.


INTRODUCTION
Lung cancer is the leading cause of cancer-related deaths among men. 1 Consequently, early detection is essential for decreasing the number of cancer-related deaths.X-ray computed tomography (CT) was recently adopted as a massscreening tool for lung cancer diagnosis, 2 enabling rapid improvement in the ability to detect tumors early.According to the results from the National Lung Screening Trial, 3 screening using low-dose CT decreases lung cancer-related deaths among smokers by 20%.Therefore, it is expected that a greater number of CT examinations will be adopted for lung screening in the future.Of late, positron emission tomography (PET)/CT is also being used as a cancer screening tool. 4,5T/CT is an imaging technique that provides both metabolic and anatomical information; therefore, it is also useful for the early detection of lung cancer.However, radiologists must examine a large number of images.
Computer-aided detection () provides a computerized report as a second opinion to assist a radiologist in a diagnosis and is expected to assist radiologists who are required to evaluate a large number of images to identify lesions and arrive at a diagnosis.In this study, we focused on the automated detection of pulmonary nodules using PET/CT images.PET/CT examination was conducted for both screening and localization of tumors.Therefore, we first develop the  scheme for nodule detection in PET/CT images to be applied for both objectives.

1.A. Related works
Many lung tumor segmentation methods have been proposed for  schemes for PET/CT images. 6Ballangan et al. reported lung tumor segmentation by using graph cuts and showed that volume measurement error between segmentation result and manual delineations was smaller than those by conventional methods. 9Wang et al. developed a tumor delineation method by using tumor-background likelihood models in PET/CT images. 8As a result of evaluation using 40 patients, dice similarity coefficient (DSC) was 0.80 for simple group (nodules whose boundaries were clear) and 0.77 for complex groups (nodules abutted or extended into normal tissue).Cui et al. proposed a lung tumor segmentation method in PET/CT images by using intensity graphing and topology graphing. 7hape similarities between segmentation result and manual delineations using DSC were 0.88 for simple group and 0.87 for complex group.
1][12] Cui et al. reported the hot spot detection using thresholding and found that 96.7% of hot-spots were detected correctly. 10Song et al. studied the lesion detection and characterization in the lung by using context driven approximation; out of 158 hot-spot lesions, 157 lesions were detected. 12ost of the methods described in these studies detected pulmonary nodules and masses from PET images alone; CT images were only used for identification of the lung region because the image quality of CT images for attenuation correction was insufficient.Improvements in modern PET/CT scanning technology have improved the quality of CT images; however, radiologists currently rely on CT images to detect well-differentiated pulmonary nodules, which can be detected only from CT images.Because radiologists currently identify nodules from both PET and CT images, it is better to detect nodules through  using both PET and CT images.
In our first study, however, an automated scheme for the detection of pulmonary nodules making use of both CT and PET images was developed. 13The detection sensitivity was 90.0%, and the corresponding number of false-positive (FP) detections per case was 17.0.Furthermore, we improved detection capability by using a nodule enhancing method with an active contour technique and showed that the sensitivity of detection was 90.0%, with 9.8 FPs/case. 14However, there still exists a higher number of FPs.For practical use, therefore, a further improvement in the FP-reduction technique is strongly required.

1.B. Objective
In this study, we propose an improved FP-reduction scheme for the detection of pulmonary nodules in PET/CT images.A major objective of our study is to develop an ensemble FP-reduction method using a convolutional neural network (CNN), which has attracted attention in the artificial intelligence and brain science fields in addition to the conventional method using shape/metabolic features.
In this paper, the architecture of an improved FP-reduction scheme for the detection of pulmonary nodules is described.In addition, the detection performance as evaluated with the original PET/CT image database has also been discussed.

2.A. Overview
The outline of our overall scheme for the detection of pulmonary nodules is shown in Fig. 1.First, initial nodule candidates were identified separately on the PET and CT images using the algorithm specific to each image type.Subsequently, candidate regions obtained from the two images were combined.FPs contained in the initial candidates were eliminated by an ensemble method using multistep classifiers on characteristic features obtained by a shape/metabolic analysis and a CNN.

2.B. Initial nodule detection
With regard to the detection in CT images, the massive region was first enhanced using an active contour filter (ACF), 14 which is a type of contrast-enhancement filter that has a deformable kernel shape.The active contour involves the several nodes that are connected to each other.We define the evaluation function of the active contour as the maximum pixel value on the connected lines.The nodes move iteratively in order to minimize the evaluation function.Thus, the active contour encloses the nodule without touching normal organs such as blood vessels and lung wall.The final output of the ACF is the difference between the maximum pixel value on the active contour and the pixel value at the center of the filter kernel.Detailed procedures and figures pertaining to ACF are shown in our previous report. 14pplying ACF to the original image causes pixel values to rise for massive structures and drop on continuous structures (such as blood vessels and lung walls).The initial nodule regions were segmented by thresholding the enhanced images followed by labeling.
The PET images were subsequently binarized using a predetermined threshold to detect regions of increased uptake.Here, candidate regions other than the lungs were eliminated using the lung regions obtained by CT images.
Initial candidate regions detected on CT and PET were represented as binary images.The two images were then combined using the logical OR function.Following pixel-by-pixel confirmation of regions on both images, a region detected by at least one modality was treated as an initial candidate region.

2.C.1. Outline
FPs included in the initial candidates are composed mainly of narrow bronchi and blood vessels in the lung.In addi-tion, most of the FPs in the initial candidates in PET images are due to the physiological uptakes in myocardial and liver regions adjacent to the lungs.Therefore, the integration of both shape features from CT images and metabolic features from PET images can be considered to eliminate the FPs. 13 However, some FPs represent image features that are similar to nodules, as shown in Fig. 2; our previous method could not represent the valid features sufficiently by using shape and metabolic features in order to differentiate and eliminate them.
In order to eliminate such FPs while maintaining the value of true positives (TPs), this study focused on CNN, which is a type of deep learning architecture. 15CNN was inspired by biological processes and specifically designed to emulate the behavior of visual systems.CNN has the capability to learn representations of input data by using multiple levels of feature extraction.In some image recognition trials, results were dramatically improved using CNN. 16,17Studies indicated that CNN might be used to reduce FPs by generating novel valid features that were not generated by the shape and metabolic features used in conventional FP-reduction methods.Therefore, the novel ensemble FP-reduction method was developed for this study by incorporating the CNN technique into our previous FP-reduction technique that used shape and metabolic features.

2.C.2. Classification using convolutional neural network
The architecture of the CNN used for FP-reduction is shown in Fig. 3.It consists of three convolution layers, three pooling layers, and two fully connected layers.The pixel values associated with initial candidate regions in the CT and PET images are given to the input layer of the CNN.The candidate region consists of two kinds of 3D images, so the total number of pixels in the candidate region exceeds 54 000 (30 × 30 × 30 × 2) when a side of candidate region is 30 mm or greater.Having a large number of input units causes slow convergence of training and high computational costs.Therefore, representative 2D images are generated.Axial and sagittal images are introduced for CT images.As PET images have poor anatomical information and low spatial resolution, it is more important to identify the existence of high uptake regions than the 3D structure.Therefore, we used a maximumintensity projection (MIP) image along the body axis in PET images.
For the CNN computations, we used the Caffe package which is a deep learning framework developed by the Berkeley Vision and Learning Center. 18The input to the first convolutional layer is (32 × 32) × 3 images; CT-axial, CT-sagittal, and PET-MIP images are resized to 32 × 32 pixels.Convolution layer 1 uses 32 filters with 5 × 5 × 3 kernel resulting in a feature map of 32 × 32 × 32 pixels.Pooling layer 1 conducts subsampling (resampling) that outputs the maximum value in 3 × 3 kernel for every 2 pixels reducing the matrix size of the feature map to 16 × 16 × 32.After three convolution layers and three pooling layers, there are two fully connected layers consisting of multilayer perceptron.After all layers are completed, the probabilities of TP and FP are obtained F. 5. Histogram of probabilities of FPs and TPs.from output.By performing the training of the convolution and fully connected layers, two separate outputs represent the probabilities of judgment for FP and TP.

2.C.3. Shape and metabolic feature extraction
For each candidate region, shape and metabolic features are calculated.A total of 18 features are obtained from the CT images, including sectional areas in the three planes (X-Y , X-Z, and Y -Z), volume, surface area, contour pixels in the three planes, compactness, convergence in the three planes, and CT values (max, center, standard deviation) in the candidate region. 19A total of eight metabolic features are obtained from the PET images, including the standardized uptake value (SUV) at the center of the candidate region and the maximum and mean values of SUV in the candidate region, 13 and sectional areas in the three planes, volume, and surface area in the candidate region.

2.C.4. Classification using rule-based and support vector machine (SVM) classifiers
Using the probabilities of FP and TP given by the CNN, the shape and metabolic features of FPs in the initial candidate region are eliminated.First, the shape and metabolic features are given to the rule-based classifier in order to eliminate the obvious FPs. 13 FPs are identified by a simple method of providing low and high limits for each feature (e.g., candidates whose vector concentration value was <0.4 were judged as FPs).
The remaining candidate regions are then given to the two SVMs, where TPs and FPs are classified.The initial candidate regions detected only by CT images indicate that there are no high uptake regions in PET images.Therefore, many features obtained by PET are set to zero.In contrast, as for the initial candidates detected by PET images, morphological changes are usually observed in CT images.Because the properties of the obtained features and the number of effective features are different under two conditions, two SVMs were introduced.A total of 22 features (including all features obtained by CT, three SUV features by PET, and CNN output) are given to the first SVM, SVM #1, and all features obtained by the proposed method are given to the second SVM, SVM #2, based on the above discussion.

3.A. Clinical data and environment
A total of 104 Japanese men and women who underwent whole-body PET/CT during cancer screening programs from 2009 to 2012 were included in this study.Scanning was performed using a Siemens unit (TruePoint Biograph 40) with standard settings that are routinely used in the clinic.The spatial resolution of the PET images was 4.0 × 4.0 × 2.0 mm 3 , while that of the CT images was 0.97 × 0.97 × 2.0 mm 3 .Before the automated detection, we conducted an image-matrix enlargement of PET images so that the field of view and pixel resolution became equivalent as those of CT images using linear interpolation technique.A total of 183 nodules were detected in 84 patients.The average values for diameter, CT value, and SUV max of these nodules were 18.9 ±15.6, 25.3±384.4,and 4.01±4.70mm, respectively.Examples of nodules in the image database are shown in Fig. 4. The center coordinate (x-y-z) of nodules is provided by the radiologist.
The data pertaining to candidate regions were randomly divided into five data sets and evaluated using the cross-validation method.A candidate nodule was considered correctly detected if the center coordinates of the nodule marked by a doctor existed inside the candidate region (area) obtained by the proposed method.In the method of Paik et al., 20 the candidate was judged as a TP if the center of nodule marked by a doctor existed in the range of a predetermined distance from the candidate region.The criterion of our proposed method was stricter than that of Paik's method because we set the distance margin equal to zero.An initial candidate region was considered to be an FP when no registered nodules were assigned to the region.With regard to the detection parameters, the maximum filter radius of ACF was set at 25 mm, while the number of nodes was set at 8. For detection on PET, the threshold was set at 2.0.These parameters were determined in the previous paper, which is based on the preliminary experiments and knowledge of radiologists.The calculation of the automated detection was performed using in-house  software using an Intel Core i7-6700K processor (4 CPU cores, 4 GHz) with 16 GB of DDR4 memory.
For the ensemble FP-reduction method, the training of the CNN was conducted using the dedicated training program bundled in the Caffe package, which is accelerated by a GPU (NVIDIA GeForce GTX 970 with 4 GB of memory).Final classification using SVMs was calculated using LIBSVM. 21e used C-support vector classification as an SVM algorithm, and the radial basis function as a kernel function.
This study was approved by an institutional review board, and patient agreements were obtained given the condition that all data were anonymized.

3.B. Detection results
In the initial detection, among 181, 163, and 80 nodules were detected by CT images and PET images, respectively.Among these detected nodules, 67 nodules were detected by both images; total sensitivity was 97.2% (176/181).In addition, 7575 FPs were contained in the initial candidates, so the number of FPs/case was 72.8 (7575/104).
With regard to the FP elimination step, the histogram of the probabilities of TP and FP is shown in Fig. 5. Here, since the number of FPs was much larger than that of TPs, histogram values were normalized by the numbers of TPs and FPs.The result was that about 70% of TPs and FPs were identified correctly.The rule-based classifier using both the output of the CNN and the shape/metabolic features cut out 4777 FPs, so the number of FPs/case was reduced to 26.9 [(7575 − 4777)/104] without any drop in TPs.
The free-response receiver operating characteristic (FROC) curves made by changing the SVM parameters (cost, weight) are shown in Fig. 6.In order to conduct a comparative evaluation, the FROC curve of the previous method is also shown.The sensitivity of our proposed method was 90.1%, with 4.9 FPs/case.All FPs shown in Fig. 2 were successfully eliminated using the proposed method under these conditions.Furthermore, two examples of detected nodules by our proposed method are indicated in Fig. 7.In the figure, if the center of nodule marked by the doctor exists inside the red contour (candidate region), the candidate was judged as a TP.
Examples of nodules missed by our proposed method are shown in Fig. 8.In the figure, the left and center images show the nodules missed at the initial detection stage, and the right image shows the nodule missed at the FP-reduction stage.Examples of FP regions outputted by the proposed method are shown in Fig. 9. Average and standard deviation of diameters of FPs on the CT images were 7.86 and 4.89 mm, respectively.
In the present system, initial detection of nodules from CT and PET images was performed automatically by the in-house software.Ensemble FP-reduction was carried out using the Caffe package. 18There were no manual interactions involved that required judgment of the operator.The processing time for initial detection using ACF was approximately 340 s/case, and ensemble FP reduction took 35 s/case.The CNN training took approximately 24 min by using GPU.

DISCUSSION
The number of detected nodules in the initial detection was 176, and its sensitivity exceeded 97%, so satisfactory performance was obtained.However, it was accompanied by a large number of FPs at initial detection.Here, most of the undetected nodules were subtle nodules with low CT values and low uptake of fluorodeoxyglucose (FDG).In Fig. 5, approximately 70% of the CNN output showed the correct decision for TPs and FPs; however, it was not better than our previous methods 13,14 using characteristic features and three SVMs.As for the overall performance using the ensemble FP-reduction technique with CNN and shape/metabolic features as shown in Fig. 6, sensitivity exceeded 90%, with 4.9 FPs/case.Most of the FPs found in our  scheme appeared to be easily eliminated by radiologists.The proposed  scheme therefore appears to be useful for image diagnosis by radiologists during screening and follow-up examinations.
If we compared the proposed method and conventional methods using FPs per case at the same detection sensitivity, that of our previous study 14 without CNN was 9.8 FPs/case.Our ensemble FP-reduction method using CNN technique eliminates approximately half the FPs existing in the previous study.In a state that has sufficiently high detection sensitivity, this improvement is a noteworthy advance.These results indicate that CNN has good properties and can thus employed as one of the characteristic features effective for FP reduction.
For further improvement of detection, FPs at the initial detection step should be reduced.Following this, the classification performance of the CNN should be improved, which may be accomplished with the optimization of the CNN architecture, including the number of layers used.A number of researchers are studying novel architectures of CNN and performance should be improved while watching these trends in CNN techniques.
A limitation of this study is the small cohort used.It is necessary to perform the evaluation by using a large cohort in the future.Furthermore, practical usefulness of this method should be evaluated by performing an ROC analysis of radiologists with and without the proposed  scheme.

CONCLUSIONS
In this study, an ensemble FP-reduction technique has been proposed using conventional shape/metabolic features and CNN technique.Evaluation results showed that the sensitivity exceeded 90%, with 4.9 FPs/case; the proposed method using CNN techniques eliminates approximately half the FPs existing in the previous study.These results indicate that our improved method may be useful for computer-aided detection of lung tumors in clinical practice.

F. 1 .
Proposed overall scheme for detecting lung nodules in PET/CT images.Medical Physics, Vol.43, No. 6, June 2016 F. 2. Difficult examples of FPs along with their contours, all of which were not removed by our previous FP-reduction method (Ref.14).However, all of them were correctly identified as FPs by our new method.

F. 3 .
Architecture of CNN.Medical Physics, Vol.43, No. 6, June 2016 F. 4. Examples of nodules in the image database.Pairs of transverse views of CT (left) and PET (right) images are shown.Arrows indicate the nodules.

F. 6 .
FROC curves of the proposed and previous methods.Dashed line shows the previous method using an active contour filter without ensemble FP reduction (Ref.14).Medical Physics, Vol.43, No. 6, June 2016 F. 7. Examples of nodules detected by the proposed method.Pairs of transverse views of CT (left) and PET (right) images are shown, in which the detected candidates' contours determined by CT or PET are marked by red lines.

F. 8 .
Examples of nodules missed by the proposed method.Pairs of transverse views of CT (left) and PET (right) images are shown.Arrows indicate missed regions.Medical Physics, Vol.43, No. 6, June 2016 F. 9. Examples of FPs together with their contours, all of which were not eliminated by the proposed method.