A Deep Learning-Based Transfer Learning Framework for the Early Detection and Classification of Dermoscopic Images of Melanoma

Less contrast between lesions and skin, blurriness, darkened lesion images, presence of bubbles, hairs are the artifacts makes the issue challenging in timely and accurate diagnosis of melanoma. In addition, huge similarity amid nevus lesions and melanoma pose complexity in investigating the melanoma even for the expert dermatologists. In this work, a computeraided diagnosis for melanoma detection (CAD-MD) system is designed and evaluated for the early and accurate detection of melanoma using the potentials of machine, and deep learningbased transfer learning for the classification of pigmented skin lesions. The designed CAD-MD comprises of preprocessing, segmentation, feature extraction and classification. Experiments are conducted on dermoscopic images of PH2 and ISIC 2016 publicly available datasets using machine learning and deep learning-based transfer leaning models in twofold: first, with actual images, second, with augmented images. Optimal results are obtained on augmented lesion images using machine learning and deep learning models on PH2 and ISIC-16 dataset. The performance of the CAD-MD system is evaluated using accuracy, sensitivity, specificity, dice coefficient, and jacquard Index. Empirical results show that using the potentials of deep learning-based transfer learning model VGG-16 has significantly outperformed all employed models with an accuracy of 99.1% on the PH2 dataset.

Early detection is vital which can improve the prognosis of individuals to be suffered from malignant melanoma 1 . Prognosis starts at first step with visual screening, at the second step, the dermoscopic investigation is performed, and at the third step, a biopsy is conducted which is proceeded by histopathological analysis 2 . Since, the melanoma lesion differs in appearance, shade, and areas for various types of skin 3 , investigation of dermoscopic images is vital in the prognosis of melanoma at a very early stage 45 .
Several studies are in progress towards the classification of pigmented skin lesion images using various learning paradigms. The use of machine learning methods to classify lesion images requires prior knowledge and solely based on handcrafted features that pose complexity. In a computer vision competition namely ImageNet Large Scale Visual Recognition Challenge (ILSVR), an error rate of 30% has been observed using machine learning classifiers for the classification of around two million images into thousand categories 6 .
While the same competition was organized in 2012, an error rate of 26-30% is obtained using conventional machine learning methods which is then reduced to 16.4% when image classification is performed using convolutional neural network (CNN), a deep learning classifier 7 . The error rate is reduced drastically to below 5% using the deep learning architectures in the ILSVR competition organized in 2017 8 . Numerous CNN pre-trained architectures are employed for image classification namely, LeNet 9 , AlexNet 10 , ZFNet 11 , VGGNet 12 , GoogLeNet 13 , ResNet 14 etc. Pre-trained models have a higher capability of extracting features from the image using transfer learning. Despite many images using augmentation, the dataset's size is sometimes not sufficient for training a model since scratch. The issue overcomes using transfer learning with the help of the pre-trained CNN architectures that can enhance the classification accuracy of pigmented skin lesion images by skilled dermatologists. However, the analysis of melanoma by experienced dermatologists often fail due to some of the difficulties such as the color of hairs, different areas, shade and appearance of a mole, the variance between the actual normal skin and lesion. These require a computer aided diagnostic (CAD) system to help experts in classifying the lesion as benign or malignant.
In this work an automated computer-aided diagnosis system for melanoma detection (CAD-MD) is designed for the effective and accurate classification of pigmented skin lesions as benign or malignant. CAD-MD is designed to assists dermatologists or medical experts in identifying the lesions The potentials of proposed system as follows: • An automated CAD-MD system is designed for the early and accurate detection and classification of melanoma as benign or malignant.
• Image-downsizing is performed for reducing complexity that occurs during training and testing of the images.
• Hair-noise removal is conducted using the digital hair removal (DHR) method.
• For the easier analysis of image, significant region of interest (SROI) are extracted from the pigmented lesion images by performing segmentation using the Watershed algorithm.
• Gray-level co-occurrence matrix (GLCM) is used for the better extraction of features from the segmented pigmented skin lesion image.
• Two set of experimentations are carried with actual images and augmented images on dermoscopic PH2 and ISIC-2016 datasets.
• The classification of pigmented lesions is carried out utilizing the machine learning, and deep learning-based transfer learning models with tuned parameters on actual and augmented images.
• The performance of CAD-MD is evaluated on PH2 and ISIC-2016 dermoscopic datasets based on accuracy, precision, recall, f1-score and error-rate.
The organization of the paper is as followsprevious literature regarding the classification of dermoscopic skin lesion images is discussed in section-2 in brief. Section-3 elaborates the proposed CAD-MD system comprising the preprocessing, segmentation, feature extraction and classification methods. Performance measures are demonstrated in section-4. Experimental results are reported in section-5, while the summarized discussion and the conclusion of the work is demonstrated in section-6 and section-7.

Related work
Deep CNN architecture is used 3 for extracting the skin lesion from the images. Overall, the proposed methodology gives a promising accuracy of 92.8%. However, combining the features obtained from several pretrained CNN networks may yield better-classified results. A methodology is proposed in 15 that performs a comparison between 1-level classier and 2-level classifier. The best performance results are achieved from a two-level classifier with promising accuracy of 90.6%. A novice classification system 16 is designed using the KNN technique for classifying the skin lesion. The designed system has achieved an accuracy of 93% during the experimentation. However, the system lacks significant improvement due to smaller size of the training set. An automated system 17 is designed using deep learning methodology to detect melanoma and classified the dermoscopic image as benign or malignant. They proposed a Synergic Deep Learning (SDL) model using the Deep Convolutional Neural Network (DCNN) for addressing the challenging issues caused by the intra-class-variation and inter-class-similarity to classify skin lesions. They achieved an accuracy of 85.8% which is not quiet promising compared to the state-of-art-methods discussed in the work. A new approach for segmentation and classification 18 is designed of the skin lesions as well as a Region growing technique for performing the segmentation of extracting lesion areas. Extracted features are then classified by employing SVM and KNN classifiers and performance is measured using F-measure with 46.71%. A number of ongoing work towards designing an automated pigmented skin lesion classification system utilizing deep and transfer learning might save the medical expert's time and effort. One such CAD is designed by 19 , using hand-crafted features based on color, shape and texture to combine those features with deep learning features towards detecting melanoma. Similar work had also been reported by 20 utilizing transfer learning. Another lesion classification system is designed by 21 using the pre-trained model's deep convolutional network. Other frameworks have also been proposed for the automated classification of skin lesion images utilizing convolutional neural networks by 22 , 23 , 24 . Several studies towards melanoma detection and classification at an early stage have also been reported in [25][26][27][28][29] .Worldwide several researchers are working towards solving an un-balanced dataset for the classification of melanoma as benign or malignant. One such comprehensive work regarding solving the issues of skewed distribution has been conducted by 30 by employing several oversampling methods including ADASYN, ADOMS, AHC, Borderline-SMOTE, ROS, Safe-Level-SMOTE, SMOTE, SMOTE-ENN, SMOTE-TL, SPIDER and SPIDER2, and under-sampling methods including CNN, CNNTL, NCL, OSS, RUS, SBC, and TL. They investigated the effect of skewed class distribution over the learners employed and identified the best rebalancing method for several used cancer datasets. Instead of using conventional data-level balancing approaches, 22 have rebalanced the dataset by creating synthetic dermatoscopic lesion images from the undersampled class employing un-paired image to image translation. 31 has investigated the impact of an imbalanced dataset towards classifying the melanoma as benign or malignant by employing oversampling strategies including random oversampling (ROS), and SMOTE while they used Tomeklink (TL), random undersampling (RUS), neighborhood cleaning rule (NCR) and NearMiss undersampling methods for balancing the dataset. They also used hybrid sampling methods for dataset rebalancing, including SMOTE+TL and SMOTE+ENN.

Limitation of the related work
The functionalities of the conventional CAD system discussed in the literature, includes pre-processing, data augmentation, segmentation, feature extraction, and classification which assist the medical experts in detecting and interpreting the disease effectively. The majority of the literature lacks in designing such an effective system towards a cost-effective diagnosis of the disease where medical experts are not available. The problem of data limitation and skewed class distribution in dermoscopic datasets aggravates the issue in melanoma detection. The majority of the literature has focused more on classification instead of performing classification by solving the imbalance issues. Few studies reported in the literature have worked towards dataset rebalancing. Among several approaches towards balancing the uneven class distribution to skewed datasets data-level approaches are frequently used. It can be observed from the literature that the reported work rebalances the class distribution of the image dataset using oversampling and undersampling methods. However, oversampling methods generate duplicate samples from the minority class, which results in overfitting and leads to other issues like class-distribution shift during several iterations. Another approach that has been frequently used in the literature for balancing the dataset is the use of undersampling methods. Though the undersampling methods balance the class distribution they results in information loss as they eliminate the samples from the majority class until the dataset gets balanced.
Our proposed CAD-MD system overcome the issue of data limitation and class skewness by utilizing image augmentaion techniques which artificially enhances the size of dataset without changing the semantic meaning of actual images while overcoming the class skewness. Using the data augmentation operations like rotation, vertical-horizontal flips, horizontal-vertical shear we have generated a new training dataset by transforming the images from the current lesion images belonging to the same class as the actual lesion images. The method minimizes the chance of overfitting and enhances the size and quality of lesion images

Proposed CAD-MD System
This section discusses the proposed CAD-MD system designed for the effective and accurate detection and classification of pigmented skin lesion as benign or malignant. The framework of designed CAD-MD system is shown in Figure  1 as follows:

Dataset Acquisition
PH2 dataset 1 comprises Dermoscopy images which were obtained from the Dermatology Service of Hospital Pedro Hispano, Matosinhos, Portugal. The PH2 database comprises several criteria: colors, pigment network, dots/globules, streaks, regression areas, and blue-whitish veil assessed by the expert dermatologist doctors. Another dataset, ISIC-16 232 , is a collection of some publicly available high-quality dermoscopic images of skin lesions. Figure 2 presents the samples of pigmented lesions of PH2 and ISIC datasets.

Preprocessing Preprocessing of images is twofold Image-Downsizing
At first, the size of the images is transformed. The original size of PH2 and ISIC-16 images dataset is 765 × 572 and 1022 × 767 respectively. These images were downsized to a size of 120 × 160 to reduce the complexity and computational time, consumed during training and testing.

Hair Removal
The images comprising hair-noise in PH2 and ISIC-16 datasets might affect the accuracy, which is being removed by employing DHR (Digital Hair Removal) algorithm. At first, the original-colored image is converted into a  black-hat filtered image; (d) intensified hair contours image using threshold; (e) inpainted image with removed hair using mask grayscale image. Then to find the hair contours, morphological blackhat filtering is applied on the gray scale image, where hair contours are then intensified towards preparing for inpainting. Finally, a hair-free image is obtained after applying inpainting on the original image as per the mask. The algorithm then starts from the region's boundary and inpaints the region of an image by chosing a pixel near to the region's pixel to be inpainted. It replaces the pixel to be inpainted by normalized weighted sum of neighboring pixels. The pixels lying closer to the region are provided higher weights. Once the pixel is inpainted,the algorithm moves towards the next closest pixel employing the fast marching method (FMM) 33 . cv.INPAINT TELEA function of cv2 library is used for inpainting. Figure 3 illustrates the process of hair removal using DHR 34 .

Data Augmentation
A huge challenge with dermoscopic datasets is the small number of labeled lesion images. The main aim of augmentation is to obtain an optimal performance with deep learning, which requires huge number of training lesion images. To overcome the issue of data limitation and class skewness in the data, augmentation is performed on the images after the hair-removal process. Applying the image augmentation on the dataset does not alter the semantic meaning of the lesion images. The scale and the position of a skin lesion within an image maintains its semantic meaning and doesnot effect classification 35 . Thus, the dataset is enlarged by transforming the original images towards generating new images with similar labels of actual one by applying rotation, verticalhorizontal flips, horizontal-vertical shear in the training set and testing set for each class of ISIC-16 and PH2 dataset. The total number of images before and after augmenting the images are listed in Table 1.

Segmentation using the Watershed method
Watershed 36 is a region-based method which uses image-morphology. The method comprises of following steps: • Gradient magnitude of input image: It finds pixel boundaries. Gradient represents high pixel-  • Dilation and Erosion: Several morphological operations can be performed using dilation and erosion like opening, closing and decomposing of the shape.
• Erosion: It removes the minute infected region for preprocessing the image with a highly infected area.
• Dilation and Complement: It enlarges the size of the object and fills small holes and narrow gulfs in objects. Figure 4 shows the sample before and after segmentation images in both the datasets.

Feature Extraction using GLCM for Skin Lesion Detection
The gray-level co-occurrence matrix (GLCM) 37 , or gray-level spatial dependence matrix (GLDM) is a statistical method used for analyzing the texture of the image. The GLCM is a computation of how many times the distinct combinations of grey level values occurred in an image. Once the GLCM is created, numerous statistical measures can be derived from the GLCM matrix, which gives details regarding the texture of images. Table 2 presents the features calculated from the GLCM in this experiment:

Classification Methods
The following are the architectures used in performing the experiment.

Machine Learning Models Support Vector Machine (SVM)
Support Vector Machine (SVM) is a binary hyperplane classifier. Given a labeled dataset, SVM will generate an optimal hyperplane to classify the data into certain classes. In order to classify nonlinear data, the data is transformed into a linear form; these transformation functions are called SVM kernels 3839 . The SVM relies on structural risk minimization whose goal is to search for a classifier that minimizes the boundary of the expected error 40 . In this research, the penalty parameter of SVM is set to C =1.0, using a linear kernel used with the degree=3, gamma =1, and random state=0.

Random Forest (RF)
Random Forest is another machine learning classifier used for classification. It is a classifier that contains multiple decision trees where the number of decision trees is fixed. The

Contrast
It quantifies the local variations in the GLCM.
Correlation It quantifies the joint probability occurrence (2) of the pairs of the pixel.

Energy
It gives the addition of squared values in the (3) gray-level co-occurrence matrix.

Homogeneity
It quantifies how closely the elements are (4) distributed in the GLCM to the GLCM diagonal.
Where, Pij = element i j of the normalized symmetrical GLCM, N = number of gray levels in the image, µ = mean of GLCM, σ 2 = variance of the intensities of all reference pixels corresponding to GLCM decision trees choose an optimal attribute to maximize the Information gain at each level 41 . The purity of the dataset is maximized. Random forest is considered as a variant of the Bagging that is used for the formation of decision trees 3942 . In this experiment, the splitting criterion is set to 'gini' with max_depth and random_state = 'none' and n_estimators ='auto'. k-NN K-nearest neighbors (KNN) 4318 is one of the simplest Machine Learning algorithms. KNN is known for its late learning methodology. KNN has typically been used in literature for both classification and regression. Typical implementation considers the distance of data points in a scatter plot as a matric to predict the class label in case of classification or the output value in regression. Some of the crucial parameters that must be considered before implementing any such algorithm are the distance threshold, and value of K as they significantly impact the results of models 44 . In this research, the value of parameter k, which indicates the number of nearest neighbors, is set to n_neighbors=3 using Euclidean distance p=2.

Logistic Regression (LR)
Logistic Regression (LR) 45 is another old stochastic classification algorithm 46 , which uses a combination of independent variables as features to perform classification. Unlike Linear regression which predicts the value in a larger range, Logistic regression typically predicts the probability of an outcome in (0,1) range and works on the categorical outcome 47484945 . Likewise, in SVM, in LR the penalty parameter is set as C=1.0 with solver='liblinear' and random_state =0.

Deep Learning-based Transfer Learning Models CNN
Convolutional neural networks 50 , or CNNs 51 , are grid ANN for processing the data with a known grid-like topology. Convolutional networks combine local receptive fields, parameter

Sensitivity
Truly classified positive samples among the (2) entire positive samples.

Specificity
Truly classified negative samples among (3) the entire negative samples.
Dice coefficient Used to gauge the similarity of two samples.
Jaccard Index A statistic used for gauging the similarity and (5) diversity of sample sets. sharing along with spatial or temporal subsampling 52 . The convolution operation is typically denoted with an asterisk:

LeNet-5
LeNet 5354 architecture was used primarily. The input size of each image is 120 X 160-pixel. The pixel indents are normalized with concerning 255, the black color pixel is associated with a pixel value of 0, whereas a white color pixel is associated

VGG 16
VGG-16 architecture is chosen as a classifier in this research work as it gives better generalization with other datasets. The network's input layer needs an RGB image of 224*224 pixels. The source image, which is an input image, is undergone through five convolutional blocks. Small convolutional filters require a filter size of 3*3. A variation occurs between several filters' midst blocks. ReLU (Rectified Linear Unit) is fitted with all the hidden layers, which act as an activation   VGG-19 differs from VGG-16 in terms of a greater number of ConvNet layers, making it work faster comparatively VGG-16. The more the number of layers, the lower the learning rate and the loss function. A remarkable amount of training data is required to restrain the effects like over- Fig. 9. ROC of VGG16 model for PH2 dataset fitting (as a sequel of numerous free parameters), which introduces the need for larger training time. Transfer learning overcomes this problem by performing a fine-tune training of pretrained networks with a target database 58 59 .

Inception V3
Inception is a method that is used in LeNet architecture. Inception is a convolutional neural network that recognizes the framework in images. Serving as a multi-level feature extractor is the main aim of the inception module. The aim is fulfilled by computing 11, 33, and 55 convolutions within the same network segment. The heaped results from these filters are then fed into another layer in the network. Inception gains its strength by associating the devlopment of several convolutional networks of different size in the inception block 60 .

Exception
Extreme Inception can be another name for Exception architecture, following the concept of inception module. This architecture is comprising of 36 convolutional layers. For experimentation, Logistic Regression is used for classification. These 36 convolutional layers are then grouped into 14 segments, which are connected except for the first and last segments. Exception architecture is an architecture of the stack of divisible CNN layers with residual connections. This functionality makes the architecture easy to understand and can be modified easily 61 .

Performance Evaluation Measures
Several measures are used in this experiment for evaluating the performance of the employed classifiers including accuracy, sensitivity, specificity, Dice coefficient, and Jacquard Index 62 . Eq 1-5 defines the metric for evaluating results. These metrics are acquired from the confusion matrix shown in Table 3, where stands for True Positive, True Negative, False Positive, and False Negative respectively.

Experimental results
This section discussed the performance of the designed CAD-MD system and compared it with several state-of-the-art classification methods on the benchmark publicly available ISIC 2016 and PH2 datasets. The experiments were performed using Python 3.2 with Jupyter Notebook using the libraries namely Keras, Sklearn (scikit-learn), cv2 (OpenCV), Scipy, os, Random, Matplotlib. The models are trained on a GPU comprising given specification-Quadro P4000 NVidia 14 core GPU with 8 GB graphics memory, Intel Xeon Dual Processor workstation 2.5 GHz, DDR4-RAM 64 GB with Windows 10 Home edition. CUDA and cuDNN are the Python wrappers needed for programming with Nvidia GPU's. The models are designed using the Keras Library with the Tensorflow framework. Every model is trained to normalize the gradients using the Rmsprop optimizer, and a suitable dropout 0.5 is used to reduce the overfitting problem in Transfer learning.

Dataset details
The ISIC 2016 and PH2 benchmark publicly available datasets are utilized for experimentation in this work. The ISIC and PH2 dataset comprise 1271 and 200 actual RGB images respectively. The class ratio discussed in Table 1 represents skewness in both the datasets, comprising benign lesion images in majority. Therefore, to rebalance the dataset, images are augmented using the operations namely-rotation, vertical -horizontal flips, and horizontal -vertical shear operations. After the augmentation, the ISIC and PH2 training dataset comprises 8018, 8640 images respectively. In ISIC dataset, 1604 images are used for testing which is 20% of the dataset and 6414 images are used for training which is 80% of the dataset. Similarly, in PH2 dataset, 1728 images are used for testing which is 20% of the dataset and 6912 images are used for testing which is 80% of the dataset. For validation, 5fold cross-validation is used. Table 4 shows the parameters tuned for modeling the machine learning classifier's behavior and deep learning-based transfer learning models for effective and accurate classification.

Classification result using machine learning and deep learning-based transfer leaning models on PH2 and ISIC 16 datasets
Two set of experimentations are carried out in this work on PH2 and ISIC-2016 using machine learning and deep learning-based transfer leaning models. For the classification of pigmented skin lesions, four state-of-the-art machine learning classifiers namely, RF, LR, K-NN, and SVM, and deep learning-based transfer learning models namely Inception, Xception, VGG-16, VGG-19 and LeNet-5 are used. The first set of experiment is conducted on PH2 dataset with augmented and without-augmented dataset using machine and deep learning-based transfer leaning models.
Similarly, the second set of experimentation is performed on ISIC-2016 dataset using machine and deep learning-based transfer leaning models on augmented and actual dataset.

PH2 Dataset
In first set of experiment, we evaluated the designed CAD-MD system's efficiency using machine learning and deep learning-based transfer learning models on PH2 dataset on actual and augmented images. Results of with and without augmented PH2 dataset using machine learning and deep learning-based transfer learning models is reported in Table-5. Results obtained using machine learning and deep learning models on actual images are comparatively poor than the results obtained using augmented dataset. In machine learning models, Logistic Regression has outperformed with an accuracy of 62.5% on actual dataset, while k-NN surpasses with an accuracy of 66.7% on augmented dataset. Likewise, in deep learning-based transfer leaning models, VGG16 has outperformed with an accuracy of 95.7% on actual dataset, and performed superior on augmented dataset with an accuracy of 99.1%. The performance of machine learning and deep learning-based transfer learning models on actual and augmented PH2 dataset is graphically shown in Figure 5 and Figure 6 respectively.

ISIC16 Dataset
The designed CAD-MD system's efficiency using machine learning and deep learning-based transfer learning models on ISIC 16 dataset on actual and augmented images is carried out in second set of experiment. Results of with and without augmented ISIC 16 dataset using machine learning and deep learning-based transfer learning models is reported in Table-6. Likewise, the results reported in Table 5, results demonstrated in Table 6 shows poor performance obtained on actual images than the results obtained using augmented dataset. In machine learning models, k-NN outperformed superior on both without-augmented and augmented dataset with an accuracy of 56.7% and 61.6% respectively. While LeNet-5 has outperformed with an accuracy of 77.9% on actual dataset, and performed superior on augmented dataset with an accuracy of 82.5%. The performance of machine learning and deep learning-based transfer learning models on actual and augmented ISIC 16 dataset is graphically shown in Figure 7 and Figure 8 respectively. Table 5 and Table 6 shows that the use of deep learning-based transfer learning models has improved the accuracy of the proposed model because using transfer learning, we incorporate pre-trained weights that are used for training of every model in deep learning. The performance of all the machine learning and deep learning-based transfer learning architectures is evaluated based on the Accuracy, Sensitivity, Specificity, Jaccard index, and Dice Coefficient. Since VGG16 has achieved the highest promising accuracy than all employed classifiers, we have evaluated receiver operating characteristics (ROC) and area under the ROC curve (AUC-ROC) curve on PH2 dataset. Figure 9 demonstrates the ROC curve and AUC-ROC acquired by employing the VGG-16 model with an AUC value of 99.1% on PH2 dataset.

DISCuSSIoN
The designed CAD-MD system have a vital contribution in the early and accurate diagnosis of melanoma for numerous reasons. First, our study comprises of all the vital phases required in the development of an effective automated diagnostic system for the detection and classification of melanoma as benign or malignant. Second, it provides useful insights that assists researchers in investigating the importance of data augmentation towards achieving optimal classification performance. Third, our designed system incorporates the potentials of machine learning and deep learning-based transfer learning models in the early and accurate melanoma detection which has been overlooked by the previous studies. CAD-MD aims to provide reliable melanoma diagnosis but also assists the dermatologists and medical experts by providing valuable insights in identifying the melanoma accurately. Since the deep learning-based transfer learning models outperforms the machine learning classifiers, we performed an analysis comparing our proposed work (outperformed model) with previously conducted studies on PH2 and ISIC-2016 datasets, shown in Table 7.

CoNCLuSIoN AND FuTuRE WoRk
In this work an effective CAD-MD system is designed towards improving the classification of melanoma and exploring the influence of data augmentation on the performance of four machine learning models namely SVM, Logistic Regression, KNN, and Random Forest and five deep learning-based transfer learning models namely LeNet-5, VGG-16, VGG-19, Inception and Xception. Our designed work overcomes the data limitation and skewness issue by incorporating data augmentation. Two set of experimentations are conducted on PH2 and ISIC16 datasets to evaluate the effectiveness of all employed classifiers with and without image augmentation. Empirical results prove that enhanced classification has been achieved with augmented images, where, deep learning-based transfer learning models outperforms machine learning classifiers. The CNN VGG16 model outperforms on PH2 dataset with an accuracy of 99.13%. The performance of classifiers is verified using appropriate measures namely accuracy, sensitivity, specificity, jacquard index, and dice coefficient. The limitation of our work is that towards designing CAD-MD system we have reviewed the work published till August 2020. Since, the CAD system is a rapid emerging field of technology, our work has missed some recent work conducted after August 2020 to April 2021 in the field of melanoma detection and classification. Future work can be accommodated using new deep learning models for transfer learning to improve the results on different datasets of medical diseases. In future, the designed CAD system can further be enhanced for smart devices. The designed CAD system can also be used to diagnose several other severe skin lesion types, which vary in different skin colors.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest
The Authors declares that they have no conflict of interest.

Informed consent
Informed consent was obtained from all individual participants included in the study.