Enhancing Lung Cancer Detection from Lung CT Scan Using Image Processing and Deep Neural Networks

ABSTRACT


INTRODUCTION
The increase in the cases of Lung Cancers these days has created an emergency to detect them at the early stage to battle the disease [1][2][3].The early detection and diagnosis of lung cancer over the years, has led to improved patient outcomes and survival rates [4].Computed Tomography (CT) scans have emerged as a powerful tool for visualising the internal structures of the lungs with high-resolution images among these modalities [5].Lung cancer has historically been associated with high mortality rates due to the difficulties in detecting [6].Because CT scans can capture detailed crosssectional images of the lungs, radiologists and clinicians can detect suspicious lesions, nodules, and tumours at an earlier stage than ever before [7].The contribution of this study lies in the development of enhanced deep learning approaches that leverage Complexity Feature extraction and GLCM feature extraction techniques, along with CNNs, for accurate lung cancer detection and classification.The utilization of DOST as an intermediate stage further enhances the performance of these methods [8][9][10][11].The imaging modalities such as CT and MRI have transformed the detection and characterization of lung cancer, providing vital information about tumour size, location, and extent [12][13][14].Furthermore, advances in computer-aided diagnosis (CAD) systems have improved the accuracy and efficiency of lung cancer detection by assisting radiologists in interpreting complex CT images [15].Despite these advances, detecting lung cancer accurately and precisely remains a difficult task.The sheer volume of CT scans produced in clinical settings, combined with subtle differences in tumour appearance and the need for early detection, necessitates robust and efficient automated approaches [16][17][18].Deep learning techniques have shown enormous promise in this area.For example, in the TNM system, Stage I lung cancer generally refers to tumors that are relatively small and confined to the lung tissue, without evidence of spread to lymph nodes or distant sites.These are considered 'early stage' cancers because they are localized and have not yet advanced to a more aggressive or widespread form.
The need for a morphology-based deep learning approach for precise lung cancer detection stems from the limitations of traditional methods, which are time-consuming and prone to errors.It is possible to automatically learn and extract meaningful features from CT scans allowing for more accurate identification and classification of lung cancer.

RELATED WORKS
In general, the literature review section will delve into current research and studies on lung cancer diagnosis, medical imaging modalities, and the use of deep learning techniques.The review will lay down the foundation for the suggested morphology-based deep learning strategy and back up its significance in progressing lung cancer detection from lung CT scans through analysing and synthesising this literature.
Dritsas et al. proposed Rotation Forest in 2022, a highperformance algorithm assessed via well-known metrics [19].They reported an impressive 97.1% accuracy.However, one limitation of the research is that it did not specifically discuss the Rotation Forest algorithm's potential challenges or limitations.
Naseer et al. [20] used the LUNA16 dataset, for various stages of lung nodules, to implement a CNN with optimizers.While they achieved an impressive accuracy of 97.42%, one limitation of traditional CNNs, as mentioned in their study, is the need for a large amount of labelled data for training, which can be difficult to obtain and may affect the generalizability of the results.Venkadesh et al. [21] used an ensemble model in 2021 for the purpose of feature extraction.These features were then combined as input for classification.The limitation of this approach is the lack of discussion about the potential drawbacks or limitations of the ensemble model used.
For early tumour diagnosis, Agarwal et al. [22] proposed a standard Convolutional Neural Network (CNN) with the AlexNet Network Model.Their research used a private dataset and had a 96% accuracy rate.The use of a private dataset, however, is a limitation of this study, as it may introduce bias and limit the generalizability of the findings to other datasets.Masud et al. [23] discussed of using a light CNN architecture in 2020, achieved a high classification accuracy of 97.9%.
However, one limitation of this study is that it only looked at the LIDC dataset, leaving the performance on other datasets unexplored.Similarly, Al-Yasriy et al. [24] proposed a CNN technique for cancer detection and categorization using AlexNet.Despite their accuracy of 93.548%, the use of an imbalanced dataset poses a limitation, as imbalanced data can lead to biassed model performance and reduced effectiveness in detecting minority classes.I In 2019, Toraman et al. [25] suggested Fourier Transform Infrared (FTIR) spectroscopy signals.They attained an accuracy of 95.71%.
Nasser et al. [26] created an Artificial Neural Network (ANN) with an accuracy of 96.67% for lung nodule detection.The lack of a detailed analysis or discussion of potential limitations or challenges encountered during the ANN development and training process is a limitation of this study.Selvanambi et al. [27] demonstrated a Glow-worm swarm optimisation in 2018, with an accuracy of 98%.However, the study's limitation is the lack of a comprehensive discussion of the potential challenges or limitations associated with the GSO algorithm and its application in lung cancer prediction.Zhao et al. [28] proposed a hybrid CNN that makes use of networks like LeNet and AlexNet.They reported an 87.7% accuracy rate.However, one limitation of this study is the comparatively lower accuracy obtained, which may be problematic and have an impact on the proposed hybrid CNN approach's dependability and effectiveness.  1 provides a clear overview of the strengths and limitations of each study.In summary, early detection of lung cancer can lead to improved outcomes, reduced mortality rates, and a better quality of life for individuals diagnosed with the disease.It also has broader societal and economic benefits, making it a crucial focus area in the fight against lung cancer.
CAD systems are specialized software tools designed to assist healthcare professionals in the interpretation of medical images, such as X-rays, CT scans, and MRIs.These systems use advanced algorithms and machine learning techniques to analyze images and highlight areas of interest that may require further examination.They aim to improve accuracy and efficiency in the diagnostic process.The steps include in CAD are image preprocessing, Feature Extraction, Classification, alert generation.Whle processing the limitations includes false negatives, dependence on quality if input images, lack of clinical context, limited to image analysis etc.
The research findings might have positive implications for patients in the following ways Reduced Invasive Procedures, Decreased Psychological Burden, Improved Quality of Life, Enhanced Monitoring and Surveillance, Personalized Treatment Approaches, Earlier Detection and Treatment etc.
In a nutshell while these studies have made substantial contributions to lung cancer detection and classification, it is critical to recognise the limitations of each approach.Addressing these limitations can help future research in this area improve its accuracy, generalizability, and effectiveness.The Otsu method is a thresholding technique used to segment images.It calculates an "optimal" threshold value to separate foreground and background pixels.The key parameter in the Otsu method is the threshold value, which is determined by maximizing the between-class variance.This  Step i. Import the Lung Cancer CT scan dataset from LIDC.

METHODOLOGY
Step ii.Perform color mapping process to convert the RGB image to grayscale.
a. Colour mapping is the process of converting an RGB image to grayscale by calculating the luminance or intensity value for each pixel.The luminosity method is one of the most commonly used formulas for performing this conversion.
The luminosity method calculates the grayscale value (G) based on the RGB values of a pixel (R, G, B) using the following formula: In this formula, the coefficients 0.21, 0.72, and 0.07 represent the perceived luminance contributions of the red, green, and blue channels, respectively. Step e.For display purposes, round the output pixel values to the nearest integer.
f.When compared to the original grayscale image, the resulting image will have more contrast.
Stage 2: Thresholding and Filtering: Step i. Perform global image thresholding using the Otsu method to segment the image into foreground and background.
a. Create a histogram of the grayscale image input.Assume the grayscale image has intensity values ranging from 0 to L-1, where L is the number of intensity levels possible.The histogram will be a 1D array of L elements, with the element at index i representing the number of pixels in the image with intensity i.
b. Normalise the histogram in step two.Divide each histogram element by the total number of pixels in the image.This step ensures that the histogram is transformed into a probability distribution with a sum of 1.

c. Determine the normalised histogram's cumulative distribution function (CDF).
The CDF is calculated by adding the normalised histogram values from 0 to i, where i is a number ranging from 0 to L-1.The CDF will be a 1D array with L elements as well.
d. Determine the cumulative and total means.The cumulative mean at intensity i is calculated by multiplying the intensity value i by its corresponding normalised histogram value and adding the results from 0 to i.The sum of all cumulative means is the total mean.
The between-class variance is computed for each possible threshold from intensity 0 to L-1 using the following equation: where, P(t) is the probability of the pixels with intensity values less than or equal to the threshold.e.For each possible threshold, compute the between-class variance.
f. Segment the image using the chosen threshold.Set all pixel intensities below the threshold to 0 and all pixel intensities equal to or greater than the threshold to 255.
Step ii.Apply binarization process to convert the thresholded image into a binary image.
Step iii.Apply smoothing effect using the Sobel filter to reduce noise and highlight edges.
Step iv.Perform multi-dimensional filtering process to enhance specific features.
The above discussed part can be provided as a summarised Algorithm Stage 3: Feature Extraction: Step i. Apply morphological image processing by introducing a structuring element to extract relevant features.
Step ii.Perform dilation operation to expand the regions of interest and its Pseudocode is Step iii.Create a gray-level co-occurrence matrix (GLCM) to capture spatial relationships between pixels.
a. Load the input image and import the necessary libraries.
b.If the input image is not already grayscale, convert it to grayscale.
c. Define the GLCM calculation's distance and angle offsets.For co-occurrence measurements, these offsets determine the neighboring pixels.
d. Set the number of grey levels to be used in quantizing the grayscale image.The size of the GLCM matrix is determined by this.
e. Create an empty GLCM matrix of size f.Iterate over each pixel in the grayscale image: i. Determine the co-occurring pixel based on the distance and angle offsets specified.
ii.Based on the grayscale values of the current and cooccurring pixels, increment the corresponding element in the GLCM matrix.
g. Divide each element in the GLCM matrix by the sum of all elements in the matrix to normalize it.The GLCM is scaleinvariant after this step.
h. Normalize the GLCM matrix by dividing each element by the sum of all elements in the matrix.This step ensures that the GLCM is scale-invariant.
Step iv.Compute statistics from the GLCM, such as energy, contrast, and Entropy, as features.
Step v. Carry out rank correlation process to select the most informative features.
a. Assign ranks to data points in each dataset based on feature values for each computed texture feature.Take the following steps: i. Sort the data points in each dataset according to their feature values.
ii. Give each data point a rank based on its position in the sorted list.If there are ties, give the tied data points the average rank.b.For the current texture feature, compute the difference in ranks for each data point in both datasets.
c. Squaring the differences to remove the sign.d.Add together all of the squared differences for the current texture feature to get the sum of squared differences (SSD).e. n is the total number of data points.f.Calculate the rank correlation coefficient using the formula: This formula is for the Spearman's rank correlation coefficient.
Pass the resulting features to Stage 4.

Stage 4: CNN Model Training and Classification:
Step i. Train a CNN model using the features that have been extracted from the previous stage.
Step ii.Utilize a pre-defined CNN model architecture for training.
Step iii.Compute classification metrics Step iv.Perform lung cancer detection by classifying whether the image is cancerous or not.
Step v. Evaluate the performance of the overall system The proposed lung cancer detection method using the Improved Enhanced algorithm has potential real-world applicability in clinical settings.However, there are several considerations and potential hurdles that need to be addressed for successful integration into a clinical workflow.The stages are Data Acquisition and Integration, Pre-Processing and Computational Resources, Integration with Existing Systems, Clinical Validation and Regulatory Approval, Interpretability and Explainability, Continuous Monitoring and Improvement, Legal and Ethical Considerations.

EXPERIMENTAL RESULTS AND ANALYSIS Figure 2. Original lung scan image
The image in Figure 2 is an original lung scan image obtained from the LIDC dataset via the Kaggle platform.The LIDC dataset is a well-known dataset used in medical imaging research, specifically for lung cancer analysis and detection.For this study, a subset of 32 samples from the LIDC dataset was selected for training.The original lung scan image is used to begin further processing and analysis.The resized version of the original lung scan image is depicted in Figure 3.
Resizing is an important step in medical image classification tasks for a variety of reasons.For starters, it increases computational efficiency by lowering the computational load required for subsequent analysis.The computational resources and processing time required for feature extraction and classification are reduced when images are resized.Second, resizing is required to address memory constraints, particularly when working with large datasets.Memory usage can be reduced by reducing image dimensions, allowing for smoother execution of classification algorithms.Furthermore, resizing ensures consistent image sizes, allowing for compatibility across images during the classification process.It also makes other preprocessing steps easier, such as feature extraction, normalisation, and data augmentation, possible.The global thresholded image obtained using the Otsu method is shown in Figure 5.The Otsu method is a popular image segmentation thresholding technique.Its goal is to find the best threshold value for separating the image into foreground and background regions.The Otsu method determines a threshold that maximises the separation of these two regions by calculating the between-class variance of the intensity values.The thresholded image in Figure 5 emphasises the distinct regions within the lung scan, allowing subsequent analysis and feature extraction to focus on specific areas of interest.
The image in Figure 6 is the result of applying a multidimensional filter to the thresholded lung scan image, specifically the Sobel filter.The Sobel filter is a popular edge detection filter that emphasises sharp intensity transitions in images.The resulting image in Figure 6 emphasises the edges and boundaries of structures within the lung scan by convolving the Sobel filter with the thresholded image.This edge data is useful for further analysis and feature extraction, assisting in the identification and characterization of important anatomical structures or abnormalities.Similarly, 32 samples are currently being trained in the Lung Cancer detection process, and the features extracted using the GLCM process are tabulated in Table 2 for the corresponding 5 samples that have been processed to the proposed system.
These features offer quantitative representations of specific lung image characteristics relevant for cancer detection.

Performance evaluation
The accuracy of the Lung Cancer detection can be calculated using the following formula: where, TP: True Positive (identified Tumors); TN: True Negative, FP: False Positive; FN: False Negative (not identified).Specificity is the proportion of true negatives identified correctly by the model.It indicates the model's ability to correctly classify non-tumor or non-cancer cases.
Sensitivity is the proportion of true positives that were identified by the model.It indicates the model's ability to correctly classify tumour or cancer cases.
Table 3 provides an overview of the proposed method's accuracy performance evaluation in comparison to previous methods used for lung cancer detection.The proposed method achieved an impressive 99.269% accuracy.This accuracy rate indicates the proposed approach's ability to correctly classify lung cancer cases, making it highly effective in detecting the disease's presence.The table also compares the proposed method to others, such as Machine Learning [19], CNN Alexnet + SGD [20], Alexnet CNN [22], CNN [23], CNN with Alexnet [24], and ML with FTIR Signals [25].This indicates that the proposed method significantly improved lung cancer detection accuracy when compared to existing techniques, which included both traditional machine learning and deep learning methods.CNN [23] 97.9 2020 CNN with Alexnet [24] 93.458 2019 ML with FTIR Signals [25] 95.71 Proposed Method 99.269 Figure 8 shows a comparison plot for lung cancer detection accuracy.The plot depicts the accuracy performance of various methods, including the proposed morphology-based deep learning approach.Figure 8 illustrates the significant improvement in accuracy provided by the proposed approach when compared to existing techniques.These findings emphasise the proposed method's potential to improve lung cancer diagnosis and contribute to more accurate and efficient clinical decision-making processes.
Table 4 compares the proposed methodology to several previous methods in terms of specificity and sensitivity in lung cancer detection.The proposed method had a 99.1251% specificity and a 99.1121% sensitivity.These values represent the proposed method's ability to accurately identify true negatives (specificity) and true positives (sensitivity) in lung cancer detection.The following table compares the proposed method's performance to CNN Alexnet + SGD [20], Deep Learning [21], CNN with Alexnet [24], and ML with FTIR Signals [25].When compared to all previous methods, the proposed methodology has higher specificity and sensitivity values.This suggests that, when compared to existing techniques, the proposed morphology-based deep learning approach has significantly improved the specificity and sensitivity of lung cancer detection.
A comparison plot for specificity and sensitivity in lung cancer detection is shown in Figure 9.The plot depicts the performance differences between the proposed morphologybased deep learning approach and the previous methods in terms of specificity and sensitivity.The results presented in the study demonstrate a significant advancement in the field of lung cancer detection.The proposed method achieved an accuracy of 99.269%, specificity of 99.1251%, and sensitivity of 99.1121%.These results indicate an exceptionally high level of accuracy in classifying both cancerous and non-cancerous cases.
In the context of lung cancer detection, these results have several important implications: Improved Clinical Decision-Making: The high accuracy, specificity, and sensitivity of the proposed method suggest that it could serve as a reliable tool for assisting healthcare professionals in the early detection of lung cancer.This can lead to more accurate diagnoses and treatment plans.
Early Detection and Intervention: High sensitivity means that the proposed method is effective in correctly identifying true positives (cases of lung cancer).Early detection of cancer is crucial for timely intervention and improved patient outcomes.With a sensitivity of 99.1121%, the proposed method excels in this aspect.
Reduced False Positives: The high specificity value (99.1251%) implies a low rate of false positives.This is particularly important in clinical practice, as it minimizes the chances of unnecessary follow-up tests or procedures for patients who do not have lung cancer.
Potential for Screening Programs: The high accuracy of the proposed method makes it a promising candidate for use in large-scale lung cancer screening programs.Such programs can be instrumental in identifying cases at an early, more treatable stage.
Resource Optimization: The reduction in false positives and negatives, as indicated by the high specificity and sensitivity, respectively, can lead to more efficient allocation of healthcare resources.It can help in prioritizing cases that require immediate attention.
Enhanced Patient Outcomes: Accurate and timely diagnosis of lung cancer can significantly improve patient outcomes.It can lead to earlier treatment initiation, potentially increasing survival rates and overall quality of life for patients.
Research and Development: The high performance of the proposed method may encourage further research and development in the field of medical image analysis for lung cancer detection.This could lead to continuous advancements in detection techniques and tools.

CONCLUSIONS
We presented a methodology for detecting lung cancer using lung cancer images obtained from the LIDC Database in this paper.To achieve accurate detection results, the proposed method combines various image processing and analysis techniques.Relevant features are extracted from lung cancer images using pre-processing, segmentation, and feature extraction.These characteristics are then fed into a deep neural network architecture.The experimental results show that the proposed methodology is effective, with a high specificity of 99.1251%, sensitivity of 99.1121%, and overall accuracy of 99.269%.
The study's findings represent a significant advancement in both the accuracy of lung cancer detection and the potential application of deep learning techniques in medical imaging.This not only has direct implications for patient care but also contributes to the broader landscape of medical research and practice.The proposed method's high accuracy, specificity, The use of MATLAB as a computing tool ensured efficient implementation and consistent results.Finally, this study demonstrated the utility of image processing and deep learning techniques in the detection of lung cancer.The proposed methodology achieves high accuracy, specificity, and sensitivity, indicating that it is effective in identifying lung cancer cases.The approach's overall performance and applicability in clinical practice will be improved through further refinement of the methodology, validation through extensive clinical trials, and integration with complementary data sources in the future.

Figure 1
Figure 1 represents the proposed process flow block diagram of the new Improved Enhanced algorithm for detecting lung cancer.The algorithm incorporates several key threshold value is used to classify pixels into foreground and background based on their intensity values.The CNN architecture typically includes Input Layer, Convolutional Layers, Activation Functions, Pooling Layers, Fully Connected Layers, Output Layer, Loss Function and Optimizer, Regularization and Dropout, Number of Layers and Units.

Figure 1 .
Figure 1.Proposed Lung cancer detection method Algorithm Stage 1: Pre-Processing:Step i. Import the Lung Cancer CT scan dataset from LIDC.Step ii.Perform color mapping process to convert the RGB image to grayscale.a.Colour mapping is the process of converting an RGB image to grayscale by calculating the luminance or intensity value for each pixel.The luminosity method is one of the most commonly used formulas for performing this conversion.The luminosity method calculates the grayscale value (G) based on the RGB values of a pixel (R, G, B) using the following formula: iii.Apply histogram equalization to enhance the image contrast.a. Compute the histogram of the grayscale image input.The frequency of occurrence of each intensity value in the image is represented by the histogram.b.Determine the histogram's cumulative distribution function (CDF).The cumulative probability of occurrence for each intensity value is represented by the CDF.c. Normalise the CDF to a scale of [0, 255] (for 8-bit grayscale images).This step ensures that histogram equalisation works across the entire intensity range.  = ( −   ) * ( − 1) ( *  −   ) (2) where, cdfnormalized: Normalized CDF values; cdf: Cumulative distribution function; cdfmin: Minimum value of the CDF; L: Number of intensity levels (typically 256 for 8-bit images); M: No. of Rows; N: No. of Columns.d.Using the normalised CDF, apply the histogram equalisation transformation to each pixel in the grayscale image.Replace each pixel's intensity value with its corresponding value in the normalised CDF.  =   [ ] ∑ ((, ) 2((, )))

Figure 3 .
Figure 3. Image resize resultant The image in Figure 4 shows the outcome of histogram equalisation on the resized lung scan image.Histogram equalisation is a technique used in image processing to improve contrast.Histogram equalisation aims to maximise the use of an image's dynamic range by redistributing its intensity values.This process produces a more contrasted image, making the underlying structures and abnormalities more visible.Histogram equalisation is especially useful in medical image analysis because it improves visibility of subtle features and aids in the accurate identification of abnormalities.

Figure 8 .
Figure 8. Comparative plot for accuracy in lung cancer detection

Figure 9 .
Figure 9. Comparative plot for specificity and sensitivity in lung cancer detection and sensitivity have several direct implications for clinical practice by taking the following parameters Enhanced Diagnoses, Reduction in False Positives, Early Intervention and Treatment, Resource Allocation, Potential for Screening Programs, Improved Patient Experience.The findings of this study have broader implications for the field of lung cancer detection and medical image analysis by the following techniques like Advancement in Detection Techniques, Integration of Deep Learning in Medical Imaging, Potential for Transferable Techniques, Contribution to Research and Clinical Practice, Impetus for Further Innovation.

Table 1 .
Research gaps

Table 2 .
Features extracted for samples of lung cancer images

Table 3 .
Accuracy performance evaluation

Table 4 .
Performance evaluation for specificity and sensitivity in lung cancer detection