Early Diagnosis of Lung Tumors for Extending Patients’ Life Using Deep Neural Networks

: The medical community has more concern on lung cancer analysis. Medical experts’physical segmentation of lung cancers is time-consuming and needs to be automated. The research study’s objective is to diagnose lung tumors at an early stage to extend the life of humans using deep learning techniques. Computer-Aided Diagnostic (CAD) system aids in the diagnosis and shortens the time necessary to detect the tumor detected. The application of


Introduction
The lung is the most complicated organ in the human body and needs to be observed more closely.In specific scenarios, lung cancers are harmful and often lead to fatalities in human life due to abnormal cell growth within the lung [1,2].Many valuable lives are lost due to incorrect lung cancer estimation and identification.In biomedical image processing, which uses computer methods, several significant studies have been done to correctly segment and categorize lung cancers [3].However, image grouping and segmentation algorithms suffer from different image content, occlusion, noisy images, chaotic objects, consistent image texture, and other problems.Magnetic Resonance Imaging (MRI) scans are often used to identify the existence of lung cancers by segmenting lung cancers and then classifying them using a dataset [4].In addition, during this process, further information about the location of cancer beyond those seen on the Computed Tomography (CT) scan and the X-ray is traced [5].
Several researchers focus mainly on lung cancer segmentation and classification, taking into account discovering lung tumors as a primary goal.Lung cancer segmentation is tricky because of the vast data required for testing [6].The image is separated into pieces during segmentation based on pixels with comparable characteristics [7].Typically, specialists and physicians use manual segmentation to extract the contaminated zone from MRI scans in hospitals.According to the literature, several algorithms concentrate on segmentation, classification, or feature extraction [8].However, these existing methods are improved marginally with MRI image analysis and need effective preprocessing and sufficient feature selection procedures.
The procedure is a time-consuming operation that requires a precise evaluation of the information provided by numerous algorithms as soon as the parameters are analyzed in governing the automated segmentation process [9].The evaluation task must also take into consideration how the parameters for the system must be set.Lung tissue may be categorized into healthy and pathological, which can be used to identify diseased regions and diagnose neurological conditions.Segmentation can be conducted to determine cancer and edema.To examine distinct forms of lung tumors, cancer, and edema are divided [10].
Soft computing and Machine Learning (ML) based approaches have emerged as a top option among researchers in image processing & signal processing applications.Soft computing techniques give more priority because of advancements in extremely large-scale integration built on low-power execution [11].Most real-time applications use a threshold value to accomplish segmentation on image characteristics.For texture-based and histogram segmentation approaches, the histogram generated is considered for analysis.Based on the observations, threshold value adjustment is made, a complicated process requiring much background work [12].

Related Works
Computerized Tomography (CT), Ultrasound imaging, and MRI are among the image modalities used in analyzing cancer detected in the breast, liver, lung, or other organs in the human body leading to an accuracy of 92.14% [13].However, it has certain limitations in that measurements are supplied for a limited number of images [14].Because of the high level of geographical and anatomical variations in the environment surrounding lung tumors, automating the classification of lung cancers is a challenging endeavor with VGG19 + CNN combinational model leading to an accuracy level of 98.05% [15].The DNN architecture is created by using several smaller kernels.The neuron has a shallow mass, according to the descriptions.The experimental results performed in the study show that DNN achieves a higher accuracy rate of 98.01% when compared to all other methods that were considered to be state-of-the-art [16].
A novel lung cancer segmentation system presented with multimodal features for lung cancer segmentation experimental study with an accuracy level of 94.33% [17,18].Here in the designed system, a fuzzy method is used to classify and segment lung cancer.Fuzzy Interaction Technology is a piece of technology utilized mainly for lung segmentation.The unsupervised classification method creates a membership value for fuzzy controllers with an accuracy level of 86.21% [19].Although the performance is significantly better in terms of accuracy, it is noted that it doesn't fulfill the requirements, and there exists some lag in the process.The adaptive histogram equalization technique compensates for this with higher contrast in the image.Subsequently, segmentation using fuzzy cmeans algorithm is used to separate the overall appearance of the lung from retrieving the global characteristics to sort out the normal and abnormal cells [20].
One of the most difficult challenges in image processing is image segmentation of the lung image, which may be found in the medical field.Image segmentation is essential in many image-processing applications [21,22].The requirement for extreme precision in dealing with human life drives the automated categorization and identification of various medical images of malignancies [23].It has been quite effective in evaluating lung images.Technically, the technique develops a parametric model that considers the chosen attributes.Deep neural networks (DNNs) have recently received much interest in the proposed research context and play a predominant role [24][25][26][27][28][29][30][31].

Proposed Methodology
The structure of the proposed system for classifying lung MRI images for disease diagnosis using a DNN (based on a DWAE) is shown in Fig. 1.The major difficulty that occurs is that the data collected bears complexity and are non-stationary with heavy noise [32].This could be overcome with wavelet function is employed as the nonlinear activation function to design wavelet auto-encoder.Subsequently, DWAE is constructed with multiple WAEs.The overall process is to extract significant proportion of the gathered images which have been preserved using DICOM format, a file used in the medical field developed particularly for use with computer memory.The data themselves need to be processed first to extract images from these Digital Imaging and Communications in Medicine (DICOM) files.This image is pre-processed and matrix construction is done.Followed by this data images are further sub-divided into tiny sub-arrays.Then finally DWAE function is applied and the encoded image is processed with DNN.The results image set is finally reconstructed and output is fetched.

Figure 1: A DICOM image classification based on a DWAE-DNN model
Because there are many images, those have been broken up into smaller sub-arrays to enhance the performance.When these image sub-arrays are processed using Data Weighted Averaging (DWA), the encoded images are produced.The data set used in this study includes lung MRI scans of 153 people (both regular and with lung malignancies) referred to imaging facilities due to headaches, with a total of 1892 images, of which 1666 were used for the training and 226 for the testing.Following the doctor's inspection and diagnosis, these images included lung images of 80 healthy people.A total of 1321 images were taken for the study, out of which 56 of them were test data, and the remaining was used for training.Seventy-three of those are cancer patients, which includes a total of 571 images.The overall number of people detected with lung cancer disease varied between 8 to 66 years old, with 86 girls and 68 men among them.
One of the methods of clustering is called central clustering.A duplicating technique repeatedly seeks locations as cluster centers, which have essentially the same mean positions for every cluster.Then it assigns each sampling dataset to a group that gives the least distance toward the center of each cluster for a certain number of sets.The centers of the groups are awarded points based on the degree to which they are similar, which ultimately leads to the formation of new clusters.This technique may be simplified to its most basic form by randomly selecting the cluster centers.Extracting features from the data using a first-order clustering technique is the procedure used for investigation.Fig. 2 presents the clustering approach that has been applied to it.

Image Segmentation
Image segmentation is the process of classifying or dividing image pixels into various groupings or areas based on the pixels' characteristics.Based on some resemblance, each pixel in the image is assigned to one of the areas.Pixels in one zone have virtually identical pixel values, whereas pixels in the adjacent region have varying pixel values.Fig. 4 depicts the pixels with almost similar pixel values clustered in one section and pixels with varied pixel values organized in other parts.Threshold segmentation is used in the proposed strategy.
Pixels are assigned to regions in threshold segmentation depending on the threshold value 'T'.If the i th pixel of picture 'X' has a value greater than a specified threshold.The algorithm places specific pixels (depending on this threshold value) in a specified one.For example, in MRI picture X, the binary pixels and the pixels with values higher than the threshold T are kept in the brighter area, while the remaining pixels are preserved in the blacker region.Eq. (1) represents the region 'R' with the combined representation of both segments.where 'I' represents the image pixel, 'N' denotes the number of pixels, and 'T' represents the threshold value.An anisotropic filter is employed before threshold segmentation in the proposed approach.This boosts the image clarity and improves the image's textural quality.Color, transparency, and reflectivity are the texture properties that give an object on the screen a realistic appearance.Fig. 5 shows how an anisotropic filter was applied to a skull-stripped picture, which was then succeeded by threshold segmentation and morphological processes.Finally, the bounding box highlights the split area.

Classification
One of the most challenging classification difficulties is categorizing the data to aid in predictions and generating results.A K-Fold Cross validation approach helps divide the datasets into testing and training sets (with 'K' representing the number of groups in a data sample).Cross-validation builds prediction models using testing and training datasets.Support Vector Machine (SVM) was made up of interconnected supervised learning algorithms that are often used for regression and classification.'SVM' can minimize empirical classification error while increasing the geometric margin, which is considered one of its distinguishing features.This is regarded as one of the specific qualities, Structural Risk Minimization (SRM).SVM maps the input vector to a higher-dimensional space where the maximum separating hyperplane is constructed.The data is divided into two parallel hyperplanes, one on either side of the hyperplane.The separating hyperplanes increase the distance across the two parallel hyperplanes.The vector 'W' runs in a transverse direction to the hyperplane that separates the two points.The margin is expected to rise if the offset parameter 'b' is included.Without this offset parameter, the hyperplane is compelled to go past the origin, making finding a solution more difficult.
A comparison of the accuracy of various segmented image techniques in the real world is shown in Fig. 6.According to the results, the threshold has better accuracy.The similarity between individual pixels in an image is used as a basis for the production of distinct segmented regions, which draws attention to relatively small areas.To achieve a higher level of precision, several morphological techniques may cut off a portion of the image without completely obliterating it.The categorization accuracy and ground truth image comparison yield higher values of more than 95%, as shown in Fig. 6.

Figure 6: Segmentation and truth image comparison 4 Results and Discussion
In a few situations, specific patches of fat in the images are misidentified as or the cancers are not visible to the physician, who depends entirely on the physician's ability.The DWAE-DNN was utilized in this work to identify cancers in lung scans.These margins were clipped to avoid image noise.Implementing and combining image feature extraction using the DWAE-DNN is done to boost the network's accuracy.With an accuracy of 98.67%, DWAE-DNN correctly categorized the images as either a patient diagnosed with a tumor or otherwise.Clustering, a method for feature extraction, is combined with DWAE-DNN and is used to improve network performance in such situations.This decision was made based on the results that DWAE-DNN obtained from the first images.To determine whether or not the proposed strategy is beneficial, several other classifiers included within the DWAE-DNN architecture were put to use.In addition, the function of the classifier was validated by using the criteria of accuracy, precision, specificity, and sensitivity.Using the information shown in Table 1, the DT classifier used to assign categories to images had a DNN accuracy of 98.67%.When using the RBF classifier, the DNN has an accuracy of 97.34%, while the DT classifier has an accuracy of 94.24%.The results from the test showed that the strategy that was advised increased accuracy to 99.12%.According to the findings of the inquiry, out of a maximum of 226 images used as test data, there were a total of three images that were misread and classified wrongly.This can be shown in Fig. 7. Table 2 displays the findings obtained from DWAE-DNN analysis by applying the proposed approach to the dataset.These findings may be seen by clicking here.In contrast to the conventional DNN, the newly presented method achieved an accuracy of 99.12% based on the test data, representing a significant improvement over the older approach.The images were tested with the dataset as discussed earlier, and the standard DWAE-DNN proposed approach is used to classify after adopting the proposed technique.Fig. 8 presents the network accuracy process with the tested images.The convolution neural network is used in this study to accomplish effective automated lung cancer identification [33][34][35].The Python programming language is used for simulation.After computing the precision, it is compared to all the other cutting-edge methods currently used.This method classifies lung cancers using the SVM technique.Utilizing the features extracted through the SVM, classification accuracy is computed based on the components extracted.SVM-based cancer and non-tumor identification process are time-consuming and imprecise.Hence, an alternative DWAE-DNN-based classification is proposed that does not need any distinct phases for feature extraction.ML algorithms are validated using various tests, such as regression, distribution, and re-sampling techniques, such as cross-validation and K-fold.For the experimental study, mean skill scores are derived using these approaches and compared with other outcomes.The validation process is accomplished via statistical significance tests, guaranteeing that skill scores are obtained from the same distribution.Statistical tests are classified into two types parametric and non-parametric tests.If the model is selected correctly, parametric testing might benefit greatly.Simultaneously, non-parametric approaches are most often used to assess machine learning algorithms.Identifying the underlying distribution based on the data that is provided without taking into consideration the structure of the distributions is what is involved in non-parametric testing.These tests consider a few assumptions that were frequently referred to as distribution-free tests.
A 6-month observation period is usually recommended to detect illness signs.These illness symptoms are graded on a 6-point scale as severely impacted, partially impacted, unaffected, partly improved, substantially better, and entirely improved.This study has around 200 patients randomly assigned to an experimental therapy for observation.The final findings are shown in Fig. 11 in the form of distributions based on the statistically significant test results.This graph shows that the number of patients who are considerably worse off has increased while the number of patients who are significantly worse off has decreased shown in Table 3.However, there is minimal difference between partly and improved individuals.
The image has several qualities: intensity, contrast, noise, and darkness.The mathematical illustrations are applied to derive the mean, standard deviation, entropy, skewness, and kurtosis.The precise range for these statistical parameters can sometimes be determined.Nevertheless, for the sake of coherence, there should be significant variation among the numbers, as seen in Table 3. Except for a few factors, all of the properties are consistent across all three images.Controlling darkness may increase contrast, but this cannot overcome certain limitations.Consequently, certain anomalies occur in the preprocessing stage but can be resolved after segmentation.The correctness of the ground truth is the primary focus since it reveals the segmentation's precision, homogeneity, and segmented area.The first indication that the segmentation result was excellent is that the homogeneity was more than 90%, as was the ground truth accuracy.Consequently, threshold segmentation achieves higher categorization, uniformity, and accuracy relative to the ground truth.

Conclusion
From the study, image classification is done and categorized into two groups related to the patientto-healthy subject ratio.After preprocessing, images were fed into the DNN.Other classifiers, like the RBF classifier and decision trees classifier, have been used in the DNN framework to examine its performance.The work also assessed the network performance using accuracy, sensitivity, specificity, and precision.According to categorizer statistics, the DT classification performs most in the DWAE-DNN.The proposed approach (DWAE-DNN) is found to classify the images with an accuracy of 98.67%, either as malignant or normal patients.The results were tested from image data consisting of 226 images altogether.Using the recommended feature extraction approach and applying it to the DWAE-DNN, recommended method's accuracy is further increased to 99.12%, outperforming the standard DWAE-DNN.The accuracy of the doctors' aid in diagnosing cancer, including treating the patient, increased the recommended method's high medical accuracy because of the relevance of the physician's diagnosis.

Figure 2 :
Figure 2: Applying the clustering algorithm to the image

Figure 3 :
Figure 3: Design for a single-layer DWAE

Figure 7 :
Figure 7: Images by the DWAE-DNN for error classified

Fig. 9
depicts the variety of cancer and non-neoplastic lung image.As a direct consequence, both the complexity and the amount of time spent computing are reduced while the accuracy increases.Fig.10presents the accuracy of lung cancer categorization output.The conclusion of the classification is either cancer lung or non-tumor lung, which is determined by the value of the probability score.The probability score associated with a standard lung image is the lowest possible score.Compared to normal lungs, cancer lung has the highest probability score value compared to normal lung.

Figure 10 :
Figure 10: Accuracy of lung tumor classification

Figure 11 :
Figure 11: Comparison of patient severity

Table 1 :
The outcomes of the DNN using test data images using classifier

Table 2 :
Comparison with the proposed method

Table 3 :
Segment area vs. real tumor area comparison