Deep Learning Approach for Medical Image Analysis

Localization of region of interest (ROI) is paramount to the analysis of medical images to assist in the identification and detection of diseases. In this research, we explore the application of a deep learning approach in the analysis of some medical images. Traditional methods have been restricted due to the coarse and granulated appearance of most of these images. Recently, deep learning techniques have produced promising results in the segmentation of medical images for the diagnosis of diseases. (is research experiments on medical images using a robust deep learning architecture based on the Fully Convolutional Network(FCN-) UNETmethod for the segmentation of three samples of medical images such as skin lesion, retinal images, and brain Magnetic Resonance Imaging (MRI) images.(eproposedmethod can efficiently identify the ROI on these images to assist in the diagnosis of diseases such as skin cancer, eye defects and diabetes, and brain tumor.(is systemwas evaluated on publicly available databases such as the International Symposiumon Biomedical Imaging (ISBI) skin lesion images, retina images, and brain tumor datasets with over 90% accuracy and dice coefficient.


Introduction
Segmentation is the key process of identification of ROI of a disease region to assist in the diagnosis of diseases. It is very important in medical imaging where localization is paramount to the analysis of scans. Segmentation classifies each pixel to the part of the image the pixel belongs to and it produces the output for each pixel. Recent advancement in machine learning methodologies has led to the development of deep learning techniques in the field of medical images analysis for the diagnosis of various diseases [1]. Diseases such as brain tumors, diabetes retinopathy, skin cancer, and liver tumor have been successfully diagnosed through the analysis of MRI scans, retina vessel images, skin lesion images, and liver tumor scan, respectively, through the use of deep learning techniques [1]. Existing techniques for analyzing these images such as handcrafted methods have been limited due to their time consumption and coarse and granulated appearance of most of these images [2]. e application of deep learning techniques for medical image analysis and segmentation has produced promising results in recent times. ese approaches are however well constrained with scarcely accessible labeled datasets for training the deep learning models for effective performance [3]. is study proposes a robust and efficient deep learning framework for the segmentation of medical images towards disease discovery and prognosis with limited training data. In this work, three different sets of medical images, retina images, brain tumor, and skin lesions datasets, have been explored to assess the performance of the deep learning framework.
Automated techniques based on traditional machine learning techniques have been developed in the past for imaging and analysis of medical images towards the diagnosis of diseases. ese techniques have been limited in performance due to the complex visual appearance of these images. For example, difficulties have been experienced in the analysis of the nerve fiber layer of the optic disc and the surrounding retina [4]. e swollen optic disc may indicate symptoms such as malignant hypertension, diabetic retinopathy, etc. e macula in the optic disk may be a circular region of 5.5 mm in diameter with a 17-degree center centered, or between 4.0 and 5.0 mm, sequential, and 0.53-0.8 mm lower than the middle of the optic disk [4]. Any variation in the location and the size of this muscular from the normal form can be identified by an automated system. e proposed system efficiently performs the analysis of retina images and identifies the optic disc ROI. e system also performs analysis of skin lesion images and identifies and differentiates the ROI with melanoma from the nonmelanoma region. Lastly, the proposed system performs the analysis of brain MRI images to identify ROI with tumors and separate images with tumors from images with no tumors.
e system utilized a Fully Convolutional Network that was trained in an end-to-end manner from images directly, using only pixels and disease labels as inputs. Datasets containing both the training images and corresponding labels of the three categories of diseases experimented within this research have been employed for training the model. Each of the pixels is identified and classified to either belong to a disease or not. Conclusively, performance evaluating metrics such as dice coefficient, accuracy, etc., have been utilized to evaluate the model.

Review of Related Works
Semantic segmentation had been utilized for pixel-by-pixel categorization of medical images, which includes instances as brain MRI images, dental images, and breast and liver lesion images.
Khagi and Kwon [5] applied a deep neural network for the classification of MRI images by grouping the pixels into particular classes as well as assigning descriptions to every pixel. According to them, the MRI images indicate the authentic substance of the brain which comprises three (3) key constituents which are white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). e developed system was capable of categorizing the brain into categories such as WM, GM, and CSF. Yauney et al. [6] developed a dental categorizer based on two convolutional neural networks trained with dentist image datasets for ailment detection.
A pixel-wise deep learning approach that employed variants of Fully Convolutional Networks (FCNs) such as FCN-AlexNet, FCN-32, FCN-16s, and FCN-8s has been used for semantic segmentation of breast lesions [7]. e system used pretrained ImageNet-based models and transition of training to address information deficiency issues and was competent to classify two categories of benign and malignant.
Bellver et al. [8] suggested a strategy for segmenting the liver and its lesions from CT scans using Convolutional Neural Networks (CNNs). ey trained a detector to identify skin lacerations and diagnose with the positive detections of the effects using the segmentation network. e detector also locates the lesions using the segmentation network to eliminate false positives. e segmentation architecture was based on the Fully Convolutional Network (FCN) architecture of Deep Retinal Image Understanding (DRIU). e system operates on characteristics feature maps of diverse resolutions to allow multiscale information to be processed and learned at different network stages. A UNET-based deep neural network called RIC-UNET was also proposed for the nuclei segmentation of cellular images. e system utilized a residual inception channel attention based-UNET system. e system was tested on the dataset of the TCGA Cancer Genomic Atlas dataset [9].
A deep class-specific learning approach was also proposed [10] for the automatic segmentation of skin lesions. e technique observes on an individual basis the essential visual features of each class of skin lesions (melanoma vs. non-melanoma). ey used possibility-centered and stepwise incorporation to integrate the effects of segmentation originated from distinctive class-explicit prototypes of learning. A segmentation technique using Full-Resolution Convolutional Networks (FrCN) was also used for the segmentation of skin lesions [11]. e FrCN approach acquires directly the complete resolution descriptions of a distinct pixel of exclusive information for both the pre-or postprocessing procedures, which include eliminating artifacts, modifying truncated disparity, and thereby improving the detection of skin lesion borders. e framework was tested on two skin lesion datasets, namely, the 2017 Challenge IEEE International Symposium on Biomedical Imaging (ISBI) and PH2 datasets [11].
Finally, a study reviewed a proposed model for the separation of retinal vessel images based on Deep Convolutional Encoder-Decoder Architecture. e method has been suggested by [12] and consisted of processes for encoder and decoder units. e system tolerates a low-resolution retina image which was investigated by a series of convolution layers in the encoder section before being conveyed to the final segmented output in the decoder section [12].

Deep Learning Methodology
e proposed method conducts diagnostic images to examine and diagnose the disease. It adopts a supervised learning approach that simultaneously embraces as input both the training datasets and the ground labels. e entire training phase is pixel-wisely done where each pixel from the training images is allocated with a pixel from the ground truth labels. e preprocessing stage is the first part of the system. is performs image cropping, resizing, and resampling to guarantee that both the training images and ground truth labels are following the same resolution and size. e input images are then sent to a Fully Convolutional Network for end-to-end learning with a dice loss feature. e FCN-UNET network adopts a multistage methodology. Figures 1 and 2 define the architectural representation of a deep convolutional network and the adopted structure, respectively.
e entire system can be divided into the following components.

Data Preprocessing.
Datasets of various clinical images used in this study include images of retina vessels, skin lesions, and brain MRI. Such images were first preprocessed to resolve differences in medical images in size, scale, and resolution. Tasks such as cropping, redimensioning, and resampling were performed on the images before they were sent to the FCN-UNET network. Little image dimensions of 160 × 224 were used in this work, as this influences the map dimensions of the input function. e images are also reordered by computing the average and standard deviation of the images' pixel intensity values for the data normalization process. On-the-fly data boost was implemented to increase the number of training datasets.

Network Architecture.
e FCN-UNET network utilizes encoder-decoder architecture for end-to-end training and learning from the clinical images and their respective ground truth labels [13]. e network uses the encoder network in the initial stage to learn the general visual characteristics of the clinical images pixel by pixel. At the later stage of the encoder-decoder architecture, the network learns general lesion recovery details and also captures the lesion borders information of the images. e general architecture is explained and discussed below. e first part of the network, which is the encoder, is made up of five blocks of layers with each block composing of convolution layers, ReLU activation function, and a pooling layer each. e convolution layers perform feature extraction and generate feature maps from the input image.
e ReLU activation function employed is a nonlinear function basically for image transformation.
is accepts the feature maps as input and transforms them into the system to train and learn on them properly. e transformed output will then be sent as input to the next level of convolution. e extracted characteristic maps are then pixel-wisely classified for the final segmentation. is is illustrated in the equation below: conv layer � (filter, ReLU). (1) e ReLU activation function uses the equation below: e function of the pooling layer is to reduce the size and resolution of the extracted feature maps. is is to reduce the complexity and overfitting tendency and also to decrease the processing time for the computation. e layer adopts the equation stated below: layer � maxpoo(poolsize). (3) e FCN-UNET's second part, which is the decoder, learns about the spatial features of data for recovery and boundary positioning purposes. It restores the original size from the encoder stage of the input function map. Each block of the decoder section also contains convolution layers, the ReLU activation feature, and the upsampling layer. It is also composed of five blocks of layers. e  Computational Intelligence and Neuroscience upsampling layers perform spatial recovery features and position of boundaries while the convolution layer proceeds with the extraction of features.
ere is a short skip connection between the encoder section and the decoder section. e short skip connection enables the output from the encoder section to be merged and concatenated with the output from the convolution layers in the decoder section. is helps to increase the full restoration of the feature maps. e decoder's final output is sent to the Softmax classifier which predicts the class for each pixel as illustrated in the equation below: where n represents two class numbers and the output is a probability two-channel image. erefore, the expected segmentation corresponds to the highest likelihood category at each pixel. e network architecture is described in Figure 1.

Model Implementation and Training.
e system generally consists of two major parts to achieve segmentation via pixel-wise classification. is is shown in the general layout as shown in Figure 2.
In the first section, the model is trained using some skin lesion images training dataset.
e Deep Convolutional Encoder-Decoder Network learns image pixel-wisely in an end-to-end manner. e first convolution layer in the encoder section extracts feature maps and learns from the maps. Down-sampling is performed by the pooling layer on the extracted feature maps to reduce the size and resolution of the encoder segment feature maps. is is then sent to the decoder in the second section through the shortcut skip connection where the downsampled feature maps are restored by the upsampling layers in the decoder section to the original size and resolution. In the encoder section, the visual appearance details of the lesion are captured and learned while the location information of the lesion borders is learned in the decoder section. e method of downsampling and upsampling in the encoder and decoder portion efficiently executes the process of feature learning and extraction. Finally, the feature maps are sent into a softmax classifier for pixel-wise classification.
e Softmax module employs equation (4) to perform the segmentation process by classifying each pixel of the feature maps.
In Figure 3, a flowchart diagram that explains the architectural diagram is represented.

Datasets.
ree different medical image datasets are employed for the evaluation of the proposed system. ese are described below.

Experiments on Skin Lesion
Images. ISBI 2018 includes 2000 learning pictures with the experts' ground truth. e picture dimensions have a maximum of 1022 × 767 resolutions. e ISIC Dermoscopic Archive provided this dataset [15]. It also includes 600 images of testing with corresponding images of ground truth. e input dataset is JPEGformat skin lesion images, while the ground truth is the PNG-format mask image. e ground truth labels are provided with performance evaluation metrics to train and evaluate validation and test phase data.

Experiments on Retinal Images.
ere are 87 training images with the corresponding ground truth tags in the retina picture dataset. is also includes 40 test images with corresponding images of ground truth [16]. is is applied to increase the volume of the dataset.

Experiments on Brain MRI Images.
e datasets used in this work were taken from the dataset of Brain MRI Images for Brain Tumor Detection dataset [17].

Evaluation Metrics
Dice similarity coefficient (DSC), accuracy, and dice loss function are the most common segmentation evaluation metrics used for performance evaluation. ese metrics were used for model evaluation.
e following were illustrated.  Computational Intelligence and Neuroscience

Dice Similarity Coefficient (Dice).
It calculates the similarity or difference between ground truth and automatic segmentation. It will be specified as shown in the following equation [18]:

Accuracy (Acc).
is calculates the proportion of true results (both positive and negative) in the total number of cases investigated. is is seen in the following equation [19]: [20]:

Loss Function (Dice Loss). It uses the equation below
6. Results and Discussion e deep learning system has experimented on some datasets containing three sets of medical images. e results got are discussed below. Figure 4 illustrates the pixel-wise analysis of the skin lesion image. e proposed method performs image segmentation of skin lesion images via pixel-wise classification. e result in Figure 4 shows how the pixels on a sample image of skin lesions are grouped into categories by the proposed method. Column 4 of the diagram displays each image's confusion matrix.

Skin Lesion Analysis Results.
e image provides results of pixel categorization of 4060 pixels correctly categorized as malignance, 25135 pixels accurately categorized as nonmalignance, 7 pixels of malignance categorized as nonmalignance, and 2639 pixels of nonmalignance classified as malignancy.
is offers us more than 90 percent accuracy. e final segmentation output is shown in the next figure where the segmented output produced by the proposed system is compared with the ground truth label.
In Figure 5, some sample sets of original images and the corresponding ground truth labels from the testing skin lesion image dataset are employed for the experimentation process. e segmentation result shows a very close similarity with the expected output in the ground truth labels.
is is expressed in the curve of our dice coefficient in Figure 6 with a coefficient score of over 90 percent. Figure 6 shows the dice coefficient of the system on the skin lesion dataset as well as the training loss curves. e curves show that the loss decreases as the dice coefficient increases significantly.
is shows clearly the relationship with the adopted dice loss function utilized by the system.
Overall performance shows that with more than 90 percent of the dice coefficient and less than 10 percent loss is acquired. e dice coefficient curve shows the relationship between the segmented output and the expected outcome also known as ground truth is very close. It can also be inferred that the system works efficiently.

Retina Image Analysis Results.
e deep learning approach correctly identifies and segments the optical disk on each image of the retina. e length and location of the optical disk are established for a suitable diagnosis. Figure 7 displays the predicted outcome from the system using some original image sample sets and the equivalent ground truth label from the image dataset of the retina images. e result shows a very close similarity with the expected performance between the ground truth labels.

Brain MRI Analysis Results.
e deep learning approach correctly identifies and segments the region of interest of the brain tumor on each image of the brain MRI. e size and location of the ROI are established for a suitable diagnosis of brain cancer. Figure 8 displays the predicted outcome from the system using some original image sample set and the equivalent ground truth label from the image dataset of the brain MRI image. e result shows a very close similarity with the expected performance from the ground truth labels. Figure 9 shows the classification output of the predicted outcome from the system using some original image sample set with tumor and without tumor in the brain MRI image. Table 1 shows the segmentation and analysis of some medical images using deep learning approaches. Table 1 deduced that the proposed system performs better with 93% accuracy and a 90% dice coefficient than previous researches that had used deep learning methods on medical images. It was also demonstrated that the proposed model was tested on skin lesions, brain MRI, and retina images.

Computational Intelligence and Neuroscience
Original images Ground truth labels Predicted output  Figure 4: Confusion matrix's diagram of a test sample of skin lesion image; the first column is the input image, the next column is the segmented output, and the third column is the pixel-wise classification output. 6 Computational Intelligence and Neuroscience   Computational Intelligence and Neuroscience 7

Conclusion
e research investigated the application of a deep learning approach to medical images. An enhanced FCN-UNET method has been proposed for medical image analysis in this research. To diagnose diseases such as skin cancer, brain tumor, and retina-related disease, the regions of interest of the disease areas were first segmented and identified. e proposed system has been tested on a publicly available dataset. e performance was evaluated using metrics such as dice coefficient and accuracy. Overall performance produced promising results with more than 90% accuracy and dice coefficient score. It can be inferred that the system works efficiently. In future work, it is recommended that images can be well preprocessed using probabilistic and fuzzy approaches [25,26].
is will further improve the general performance of the proposed model.

Data Availability
e Brain MRI Images for Brain Tumor Detection dataset used to support the findings of this study is available at https://www.kaggle.com/navoneel/brain-mri-images-for-br ain-tumor-detection/.

Consent
Informed consent was obtained from all individual participants included in the study.