Content Based Image Retrieval System for Lung Cancer Detection Using Neural Network and Circular Hough Transform

Computer Aided Detection (CAD) systems that automatic detection and localize lung nodules in CT scans. A major problem in this system is a large number of false positives because of no provision for comparison of the predicted output. This paper recommends a new system with a combination of CBIR and neural network to full (cid:977)ill the gap in the area of early detection of lung cancer. From the preprocessed CT scan image, the system identi(cid:977)ies whether it contains nodules using Circular Hough Transform and classi(cid:977)ies into benign or malignant nodule using Probabilistic Neural Network. Then, it searched for the most identical pictures and retrieved it from the database. From the retrieved image, it is easy to identify the present cancer stage of the patient. Experiments have done based on both LIDC database and the locally collected database. The performance evaluation of the system is done by using both. The experimental results show that the present study easily differentiates benign and malignant nodules with an ef(cid:977)iciency of 97 % accuracy on LIDC dataset, 95 % accuracy on Local dataset and similar images are retrieved with its present stage from the available database with a higher precision and recall rate


INTRODUCTION
Lung cancer is ranked second as the most common form of cancer around the world. Radiologists nowadays use a computed tomography (CT) scans of the chest to detect lung tumors as it has high sensitivity and low error rate. However, even with these CT scans, it requires a certain amount of time and experience to detect and label the lung tumors. Moreover, considering the large number of cases that radiologists have to analyses on a daily basis, they will experience pressure and a heavy workload. A solution to reduce this pressure and workload is to use computer-aided detection (CAD) system that uses automatic detection, which localizes lung nodules in CT images. These systems are helpful to assist the radiologists in the process of lung tumors detection. They have many bene its, such as reducing the error rate of nodule detection, reducing the operation time and detecting tumors that are overlooked by the radiologists. Several studies have shown that CAD systems offer a useful second opinion. But, one of the problems in this system is a large number of false-positive which may occur during detection. To solve this, content-based medical image retrieval system is used. Firstly database prepared as benign and malignant using neural network detection. Then, when a query image is fed to the system, it classi ies it into benign and malignant and then it retrieves the most identical image from the database along with the cancer stage. This system can aid the medical ield a lot.
In (Dandıl, 2018) a recent paper, the nodule detection and classi ication were done simultaneously. Another advantage is the based-on SOM method during segmentation for the detection of small nodules in the lung. The model is lexible, which can handle the variation of shape and size of nodules, which is used to adjust the labels class in proper (Sori et al., 2019). Region growing algorithms are implemented to detect nodules in lungs from a CT Scan image of Lungs, which suggest CT based pulmonary parenchyma detection for getting an accurate and effective result (Chaudhary and Singh, 2012). With the selected size of the structural element, identify the large size juxta-pleural nodules and are eliminated, so it is suitable for isolated solid nodules detection. This can be executed in a better way in the proposed method (John and Mini, 2016).
Here we describe a scheme for diagnosing based on the computer-aided methodology for lung nodules detection and to propose a method to evaluate the similarity between the reference lung nodule dataset and query lung nodule. At the initial stage, preprocessed CT image fed to the system, Lung mask extracted using LUVEM algorithm, identi ies nodules using Circular Hough Transform [CHT]. Then, three groups of features are extracted from the database based on these nodules are classi ied in to benign and malignant using probabilistic neural network. Lastly, a CBIR approach proposed to retrieve similar images which are closely similar to the query, from the database and inally accuracy in the classi ication used as the performance assessment index. The results in the performance evaluation stage show that the accuracy of the proposed system is more than 97% and 95% in the standard database and the locally collected database, respectively. The images are retrieved with higher precision and recall rate.

Proposed work
The proposed system consists of a chain of works which had the data acquisition, preprocessing, Lung volume extraction, nodule detection after the segmentation, nodule classi ication and inally contentbased image retrieval as shown in Figure 1.

Image Acquisition
In the image preprocessing stage, that is the initial stage of the proposed work in which reading

Preprocessing
The goal of image preprocessing step is mainly concentrating on preventing misleading results which may occur in the subsequent processes. Here, the sharpening of the input done. Identifying the contrast between the various colors is considered as sharpness. A quick transition from white to black and vice versa are looked sharp. When the transition is gradual from black to gray to white looks as blurry.
The contrast along the edges in the various shapes in the image, is clear with different colors meet, is consider as sharpened image. Sharpening returns an enhanced version of the true color or grayscale input image where the edges, a common image features, have been sharpened using the unsharp masking method. A common image sharpening technique in digital image processing is the unsharp masking (USM).

Lung Volume Extraction
By considering the image processing based morphological operations, the Lung lobes are extracted from the obtained CT images. The unrelated segments on the edges and sides of the image can be removed using the LUVEM algorithm, at the time of preprocessing.

Lung Volume Segmentation
The nodule segmentation is the next step of nodule detection. The nodule segmentation is done using the K-means clustering algorithm.
The following are the steps of K means clustering algorithm, 1. Initially Place a K points into space, which represented by the objects that are being clustered. And these points considered as initial group centroids.
2. The object in the closet centroid is considered as a group.
3. When all objects have been assigned, the positions of the K centroids are recalculated.
4. The method of recalculating the new cluster center by using the formula: V i = 1 a ∑ a j=1 x i where a represents the number of data points in the i th cluster. 5. Steps 2 and 3 are repeated until the centroids no longer move.
6. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

Nodule Detection
In this stage, the location of the pulmonary nodule is determined. Lung nodules are to be detected before nodule segmentation. Mostly, nodules are in a helical and circular structure. Hence, it can be detected using circularity property. Many methods have been used commonly for identifying rounder objects present in the images. Circular Hough Transform is one of the commonly used famous methods for detection of rounder objects in the image (Duda and Hart, 1972). In order to detect the nodules in various size different radiuses are chosen in doing Circular Hough Transform. One of the famous methods used in Digital Image Processing for detecting circular objects in a digital image is the Circular Hough Transform (CHT). In this feature extraction technique, we can detect circles. It is derived from the Hough Transform. The main purpose of this technique is to ind circles in imperfect image inputs.
By voting in the Hough parameter in the space and then select the local maxima, the circle candidates are detected, the obtained values are called accumulator matrix.
The CHT relies on equations or circles (Duda and Hart, 1972). r 2 = (x − a) 2 + (y − b) 2 indicates the equation of the circle with center at x, y and radius r. The Hough transform used to identify the parameters of a circle with the number of points that fall on the perimeter are known. The parametric equation of the circle with center (a, b) and radius R can be described as, y = b + RSinθ.

Feature Extraction
From the segmented nodules in the image, various features which are required for the classi ication are extracted. Based on these extracted features, the classi ication of nodule into benign and malignant is done. Firstly, from the gray level histogram values of the segmented nodule image, the irst-order statistical features were calculated.
In this study, six basic features such as means, standard deviation, skewness, entropy, variance and kurtosis are calculated (Mao et al., 2000). GLCM features are also extracted from the image.
A total of 88 GLCM features were extracted within the various angles directions like 0 • , 45 • , 90 • , and 135 • with distance, d=2. Wavelet decomposition transform denotes the different region (TEF) based energy distribution feature. In 2D wavelet decomposition, ROI (Region of Interest) of the CT image is divided into four sub-bands.
From the processed image, obtain three images with low frequencies and one with high frequencies with the help of wavelet decomposition transform. Extracted 2D Features from CT Image of Lungs are 6 statistical features, 88(22x4), GLCM features and 13 Wavelet features.

Nodule Classi ication
The Probabilistic Neural Network (PNN) is used here for classi ication. According to the number of input 107, Neuron number are selected. PNN is the simplest neural network and takes less time for classi ication (Mao et al., 2000).

CBIR (Content Based Image Retrieval)
The most similar images are retrieved from the LIDC database using CBIR technique. The images are stored as malignant and benign as separately along with its medical description. When an input is fed to the system, the system classi ies into benign and malignant and secondly it retrieves most similar images from the database using Euclidean distance measurement. In this work, Euclidean distance is used for image retrieval. It is known as L2 distance. If u = (a, b) and v = (c, d) are two points. The Euclidean distance between u and v is given by: Based on these measurements, images are retrieved in the order of similarity.

RESULTS AND ANALYSIS
The proposed system was implemented using MAT-LAB. The dataset for the project was collected from LIDC dataset and from the local hospital for validating the results. A total of 100 images were taken from the LIDC database out of that 50 were benign, and 50 images were malignant. For testing the performance of the system, 10 images that belong to two different classes were used. After that validation was done using 60 images from the local database. Initially select a sample image from the database in Figure 2.
The sample images are then sharpened for removing the noises occurred during imaging of lungs using CT scan, as shown in Figure 3. The lung volume is extracted after the morphological operation is shown in Figure 4. The lung Segmentation is done through K-mean clustering is shown in Figure 5. The lung nodules are detected using Circular Hough Transform with smaller and larger radii is as shown in Figure 6. The lung nodules with smaller and larger radii are extracted, which is shown in Figure 7.
From the extracted nodules in each of benign and malignant, six statistical features, 22 GLCM features and wavelet features are extracted. Out of that more accurate results obtained from GLCM features and more weightage is given to that. Finally, features fed to the neural network and classi ication is done. In Table 1, GLCM features of benign and malignant nodules are given.
Most of the GLCM features have high values for malignant nodules. For getting accurate results, these features are fed to the neural network. In Table 2, shows that the LIDC database has higher accuracy on neural network detection of cancer.
The Performance evaluation of PNN neural network is done using Confusion matrix is shown in Table 3. The content-based image retrieval of a benign and malignant tumor are shown in Figure 8 and Figure 9. The performance evaluation of CBIR is shown in Table 4 and Table 5.

CONCLUSION
The proposed Lung cancer detection system implemented successfully with 97 % accuracy using PNN neural network. Lung nodule database prepared as benign and malignant. From the set of large unsorted LIDC database, similar images are retrieved using content-based image retrieval, (CBIR). The performance of the proposed system compared with the local database also for validation and obtained an accuracy of 95% detection. This system helps the doctor to improve the treatment level by correct diagnosis and also help to identify the present stage of the patient with the help of retrieval from a large database. In the future, successful treatment method of each patient can be attached with the image which can act as a reference.