Assesment dermoscopy images of skin lesion using U-Net segmentation for clinicians teledermatology

Standard solutions to the process of assessing skin lesions are based on a medical pathology examination using dermoscopy images. Fortunately, abovementioned-mentioned diagnosis and treatment quite often show a defect-prone, yet in the hands of accomplished health care professionals. Due to the rapid development of skin cancer lesions, computational analysis is required. In order to minimize the probability of error, a segmentation task is required in which multiple medical analyzes can be conducted. These frameworks are generally required versatile and need assistance from advanced computing power. The proposed system is tested with datasets for dermoscopy images of clinical signs. U-Net segmentation method provides greater segmentation result IoU 94.37, Dice Coefficient 88.11, precision 90.87, recall (sensitivity) 91.82, accuracy 94.55, loss 16.8, and F1-score 91.34.


Introduction
Skin cancer is known as irregular proliferation of skin cells, most commonly occurs on sun-exposed skin, and is the most prevalent type of malignancy in humans. The most dangerous skin cancers are melanoma, squamous cell carcinoma, and basal cell carcinoma. Malignant melanoma is identified as melanoma, the most severe form of skin cancer [1]. Malignant melanoma arises in melanocytes, where the cells produce melanin and add color to the skin. This case has a rising trend in the world, particularly for women under the age of 40. However, this recently retired cancer will be treated if it is detected early. As we know, doctors inspect and capture their patients with visual examination assisted by hand-held dermoscopy. The possibility visual analysis would catch both the macroscopic and the microscopic. The macroscope captures the optical close-up, and the microscope is also known as the dermoscopy [2].
Dermoscopy is a way to examine the skin using bright light and enhancement, and also using polarizing or immersion fluids to minimize surface reflections. Dermoscopy has only improved the rate of diagnosis by visual examination. The dermoscopy images image plays an important role in decisionmaking to help the diagnosis of a disease. This is attributed to the fact that the image offers different details found in it for study. Quantitative analysis of digital images can yield high throughput images, such as image segmentation tasks. This segmentation approach is the most preferred condition for automatic image processing, and most researchers define and analyze each lesion in the study to help clinicians measure how they respond to different treatments [3].
Recently, the image segmentation task is a crucial step needed to establish the scale and ratio that is used for clinically relevant purposes. As we know, the production of automated segmentation is difficult due to wide differences in texture, shape and colour. Various types of image processing IOP Publishing doi:10.1088/1757-899X/1175/1/012015 2 methods are provided, such as regional growth, thresholding, morphological operations, k-means clustering, and most of their accuracy are less common types. In fact, the noise and background of the image make it impossible to achieve thorough and accurate segmentation. In addition, artifacts may be produced during image processing and the contrast is weak [3].Actually, the most preferred approach is to take a deep-learning section of the lesion. One of the learning architectures for the segmentation of images is the U-Net. In general, this approach uses a downsampling encoder and an upsampling decoder architecture by separating multiple parameters for each level. This architecture has the potential to run at a much greater random sample without requiring any extra effort. U-Net is also suitable for sample processing and automated large-scale recording. U-Net is more competitive than current methodologies in terms of architecture and pixel-based image segmentation generated by convolutional neural network layers. It is also efficient with small dataset images. The value of this network system is that it can not only effectively segment the targeted lesion target and efficiently analyze and objectively examine the dermoscopy image of the skin lesion, but also help to increase the accuracy of medical image assessment. This study consequently provides a systematic literature review of U-net-based medical image segmentation, focusing on the effective segmentation implementation of U-net for medical imaging systems.

Related Works
On that same basic principle of a research study, a deep learning algorithm is the most preferred method of image segmentation to solve this issue. For such segment lesion region, the features are systematically trained in deep learning because they achieve significant output of the metrics, segment the region, and require a huge number of annotated data. The Convolutional Neural Network, known as CNN, has been able and efficient in large amounts of data. Previous studies by C. Hernandez et.al. [4] Use Function Pyramid Network (FPN) and merge neural nets with VGG-style to predict the cell mask. F. Araújo et.al, meanwhile. [5] used CNN to estimate the irregular region of cells in the pap test.
In addition, Bi et al. [6] suggested segmentation of skin lesions using multi-stage, completely convoluted networks (FCNs). The multi-stage method involved localized early-stage coarse-appearance learning and also detailed boundary-characteristic learning in the next stage. In his work, a parallel integration strategy was put in order to make a merger of the outcome that they reported to have improved detection. However, various PH2 datasets outperformed their system by around 90.66% and still show modest change. T. Tran et al. [7] use of these Seg-Net in white blood cells (leukocytes) and red blood cells (erythrocytes) in peripheral blood tests shows a global accuracy of 89.45%. Between multi cell or cell nuclei segmentation, U-Net is the most recommended and always provides state-ofthe-art segmentation outcomes.
In comparison, R. Hollandi et.al. [8] used two approaches in the form of R-CNN and U-Net to estimate cell nuclei, and the algorithm proposed to measure the level of accuracy. And then, in the analysis of Pan et.al [9], simplified reconstruction techniques based on CNN were used, the nuclei of which is roughly segmented from the background. In Yang et al. [10] 's work, Fully Convolutional Neural Network (F-CNN) was used for an iterative k-terminal trying to cut algorithm for the segmentation of glia cells. The F-CNN has used a fully connected layer with a convolution layer at the end, even though other layers extract the necessary features from input data, so this method is best suited. Akram et al. [11] demonstrated a concept for a cell bounding box and used F-CNN for cell segmentation to achieve good cell segmentation accuracy based on spatial details. Hou et al. [12] applied CNN to the nuclei section using the Generative Adversarial Network (GAN) module to generate synthetic patches dependent on weight. However, segmentation collapsed when the nuclei is close to the dark zone. Hatipoglu et al. [13] used the spatial and temporal connections of extracellular and cell pixels. It can segment cells by generating a number of pixels in the window framework to fit neighboring pixels. U-Net is the most favored approach since there has been a major change in the segmentation and classification of the medical image.
There are three reasons U-Net is used for this work. First, the ambiguity of its boundaries makes it very hard to differentiate the gland from surrounding tissue with intraprostatic tissue heterogeneity further contributing to under or oversegmentation. Second, examinations on different dermoscopic skin IOP Publishing doi:10.1088/1757-899X/1175/1/012015 3 lesion with use of different imaging protocols lead to wide variations in signal intensity. Third, the skin lesion has a wide variety of sizes, textures of tissue, either due to anatomical differences within patients or due to the presence of pathology.

Dataset
The data of dermoscopic skin lesion is provided by PH2 dataset [14]. There are three unqiue diagnose of skin lesion of dermoscopic such as melanoma, nevus, and seborrheic keratosis. Melanoma is malignant skin tumor that derived from melanocytes (melanocytic). It can be seen in Figure 1. Nevus is benign skin tumor that derived from melanocytes (melanocytic) can be seen in Figure 2. And then, seborrheic keratosis is benign skin tumor, derived from keratinocytes (non-melanocytic) can be seen in Figure 3.  Lesion information contains the original image, combined with a gold standard (definitive) diagnosis, which have seven groups such as melanocyte nevi, melanoma, benign keratosis, basal cell carcinoma, actinic keratoses, vascular lesions, dermatofibrima, and so-called ground truth. This dataset includes 200 images as training data of which 160 images are naevus (atypical naevus and normal naevus), as well as 41 images of melanoma, and each lesion is followed by a binary mask identified by an expert. These images resized as to maintain training. U-Nets were chosen due to their proven ability to segment microscopic images.

The Structural Of U-Net Models
U-Net is also known as the original semantic segmentation network, who have generated incredible results in the segmentation of medical images. This architecture can be expansively thought of as an encoder network followed by a decoder network. By integrating layers, the encoder bit by bit eliminates the spatial dimension. And then bit by bit the decoder recovers the information of the object and the spatial dimension. Besides that, there are also fast connections, where the decoder encoder allows the decoder to better recover the information of the object. As a highly invariant, the U-Net provides a stable and precise segmentation of its form, where the features of the images are combined to obtain high-resolution data. In addition, the pooling operator is replaced by an upsampling operator to increase the output resolution. Adding more layers will help remove additional data input features and improve significantly accuracy. The architecture of the U-net architecture can be seen in Figure 1. The breakdown of U-Net architecture can be seen in Figure 4. Each mechanism consists of two convolutional layer, and the number of channels varies from 1 to 64. As a convolutional process, the depth of the image will increase. The red arrow pointing downward is the max pooling process which is half the downsizing of the image. The downsizing is the size reduced from → is due to padding, but the implementation still using the same padding. Figure 6. The process is repeated 3 more times.
In Figure 6 can be seen that U-Net reaches at the bottom. Still two convolutional layers are built, but with no max pooling. The image at this bottom has been resized to . The next step is expansive path, can be seen in Figure 7. In the expansive path, the image is upsized to its original size, and need transpose convolution. This transpose convolution is an upsampling technic that expands the size of images. Basically, the transpose convolution does some padding on the original image followed by a convolution operation. After the transposed convolution, the image is upsized from to . In this stage, the image is merged with the appropriate image from the contracting line and makes an image in size to combine the information from the previous layers in order to get a more precise prediction. The uppermost of the architecture is the last step to reshape the image to make prediction requirements. The last layer is a convolution layer with one filter of size . And then, the rest left one is the same for neural network training. During training are used deep models based on the popular U-Net architecture. Adam optimization was chosen to update the weights with binary cross entropy as IOP Publishing doi:10.1088/1757-899X/1175/1/012015 6 loss function. In this method, loss function is used combination of binary cross-entropy and binary Dice loss to train the model. The Dice loss is to make segmented lesion, so it will equal to ground truth.

Loss Function
In semantic segmentation image is actually a pixel-level classification. Dice loss is to make segmenting the target of skin lesion to be equal to each other with ground truth as much as possible. The Eq. (1) of Dice coefficient is : where (1) In order to maximize the penalty, the lower the estimation result of the Dice coefficient is done by logarithmic (log) operations and negative values are taken as a loss function, e.g. Log-Dice loss. Then in (2) is shown where is the ground truth value of skin lesion image and is the predicted value. (2) In addition, Focal Loss may inhibit the contribution of easily segmented pixel regions to model loss, thereby enhancing regions that are not easily segmented. The theorem can therefore be shown in (2) as below.

4.Result and Discussion
The proposed framework of U-Net to efficiently extract relevant features in the training process. This architecture clearly outperforms traditional graphic-based segmentation method. In this on experimental is validated and testing using PH2 dataset, which have 201 images with its dimension . However, the new transformed images that will be added after image augmentation process. The performance of this segmentation is evaluated using performance metrics. The metrics based on accuracy, precision, recall, F1-score, and IoU which has a representation of each. In this work, the accuracy means the ratio of the accurately segmented the lesion and background pixels to all the pixels in the image. Between the total of segmented skin lesion pixels, the precision represent of segmented skin lesion in label images, and it is called positive predictive value. Later, recall defines the percentage of the total numbers of lesion images on the label segmented by the proposed technique or called sensitivity. The F1-score is used to calculate the average accuracy and recall. And the last one, IoU, is the correlation between the predicted values and the ground values of truth. In the Eq. (5) (7) (8) (9) In Eq. (4) First, the parameter of loss function is selected. And then, the best optimization method and loss function are chosen. The second one, skin lesion segmentation in datasets are processed using the proposed semantic segmentation as the model. The highest output and training network is chosen to boost the algorithm. The next step, then, is to evaluate the efficiency of the enhanced model with the same dataset. In this experiment, the comparison of Log-Dice loss and Focal loss with Adam is modified as an optimization. A massive quantity of data is required to create an efficient information classifier. An enhancements in image is expected to improve the performance of this network. In this research, an enhancement is described, one for random rotation and one for horizontal flipping.        So it represents the validated image segmentation performance has similar object size. In Figure 12, we compared the epoch of of train set, test set, and validation set on different epoch. Figure 13. Comparison of metrics on train set.
In Figure 13. On train set we can see that the higher epoch then smaller the loss value whereas the Dice Coefficient and IoU more higher. However, it is the opposite of the test set where the higher the epoch value, the bigger the loss value. It can be seen in Figure 14.   After conducting model training in the training set, then make predictions in the unseen test set and visualize the predicted output, so it can be seen in Figure 17. Currently, the predicted outputs are blurry enough because of the predicted pixel values are in the range 0 to 1. To make clear edge predictions, this work enhances the image by rounding the pixel value to . Figure 17 (a). Comparing the prediction after enhancement. Figure 17 (b). Comparing the prediction after enhancement.
After visualing comparison of the prediction after enhanchement, this work applying the mask. The output can be seen in Figure 18.

5.Conclusion
In this paper, we have proposed U-Net models as segmentation of dermoscopic skin lesion image. This experiment shows the segmentation results related to the proposed approach. In this experiment metrics metrics plays pivotal role during evaluating the performance such as IoU, Dice Coefficient, precission, recall (sensitivity), accuracy, and F1-score. The proposed architecture improves the performance of dermoscopic skin lesion segmentation in each area, such as visualization, training, testing, and validation.. This study only focuses for segmentation task using U-Net architecture on skin lesion dataset. Further improvement can be made by fine-tuning the hyper-parameters of U-Net. The future hope, U-Net algorithm can be tested on the other publicly available segmentation data medical and biology domains.