Skin Melanoma Classification System Using Deep Learning

: The deadliest type of skin cancer is malignant melanoma. The diagnosis requires at the earliest to reduce the mortality rate. In this study, an efficient Skin Melanoma Classification (SMC) system is presented using dermoscopic images as a non-invasive procedure. The SMC system consists of four modules; segmentation, feature extraction, feature reduction and finally classification. In the first module, k-means clustering is applied to cluster the colour informationof dermoscopic images.The second module extracts mean-ingful and useful descriptors based on the statistics of local property, parameters of Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model of wavelet and spatial patterns by Dominant Rotated Local Binary Pattern (DRLBP). The third module reduces the features by the t-test, and the last module uses deep learning for the classification. The individual performance shows that GARCH parameters of 3rd DWT level sub-bands provide 92.50% accuracy than local properties (77.5%) and DRLBP (88%) based features for the 1st stage (normal/abnormal). For the 2nd stage (benign/malignant), it is 95.83% (GRACH), 90% (DRLBP) and 80.8% (Local Properties). The selected 2% of features from the combination gives 99.5% and 100% for 1st and 2nd stage of the SMC system. The greatest degree of success is achieved on PH 2 database images using two stages of deep learning. It can be used as a pre-screening tool as it provides 100% accuracy for melanoma cases.


Introduction
The decrease in mortality rate due to skin cancer may be attributed to several treatment and detection factors. Due to the vast amount of research in both categories, significant advances have seen over the past 30 years. One of the prognostic factors for cancer cure is detection at the earliest. Currently, the advancement in imaging techniques and computerized system provides better results. The imaging techniques used for skin cancer diagnosis is dermoscopy, where a magnified visualization of the affected skin region is acquired. It shows the morphological structures that cannot be found by naked eyes. The accuracy of skin melanoma diagnosis has been improved with the use of many algorithms, such as ABCD rule [1], 7-point checklist [2] and pattern analysis [3]. However, the interpretation is time-consuming and also it is subjective based on skills of dermatologists. To overcome these difficulties, computerized systems are developed to help the dermatologists.
Among the various components in the computerized system, feature extraction and classification are the main components. Many spatial and spectral features are utilized in the former steps, and many machine learning methods are developed in the later stage for the classification. Apart from ABCD rule [1], quantitative measures such as the shape of the affected region [4], colors and structural differences [5,6] are used for effective classification. Textural patterns play a vital role to extract dominant features in the medical domain. These patterns include Haralick features [7] from gray level co-occurrence matrix, Laws texture features [8], Local Binary Pattern (LBP) [9] and their extensions.
Due to the advancement in multiresolution analysis, frequency domain features are added to the feature vector to increase accuracy further. Some of them are Discrete Wavelet Transform (DWT) [10], Multi wavelet transform [11], Contourlet [12], Curvelet [13], and Shearlet [14]. They can be used for the diagnosis independently or combined with the statistical features for better performance. The novelty is the choice of feature vectors used for effective classification of dermoscopic images. DWT will be used in conjunction with GARCH [15] to generate one of the feature vectors used with other features. In the diagnosis stage, many machine learning approaches are used for classification such as SVM [11], Naive Bayes [12], k-nearest neighbour [16], and decision trees [17].
The evaluation of deep learning [18][19][20] helps the computerized system highly effective in melanoma diagnosis, and thus this study uses deep learning as a classifier. The objective is to design a computerized SMC system with a high level of sensitivity and specificity. The rest of the paper is as follows: Section 2 discusses SMC system's design and a summary of the results obtained in this study are discussed in Section 3. The conclusion is given in the last Section.

Related Works
This Section discusses the design of the SMC system. It consists of four sequential steps which are illustrated in Fig. 1. Section 2.1 explains the lesion segmentation by a clustering approach. Section 2.2 describes how the features are extracted from dermoscopic images. The feature reduction technique is described in Section 2.3, and Section 2.4 explains the deep learning approach for the classification.

Segmentation
The exact skin cancer region is segmented using k-means clustering approach on RGB colour image. Before segmentation, the noises and hair in the images are removed by averaging filter with a predefined window size of 21 × 21. The basic k-means clustering approach is applied to segment gray scale images [21]. As the dermoscopic images are colour images and the colour information is very useful for extracting skin lesions, the k-means clustering is modified to accept colour images. The visual differences can be easily quantifiable in L * a * b * colour space than RGB mode. The conversion formulae can be found in [21] and it is known that all colour information can be visible in only two channels; a * and b * . k-means clustering is easily applied to cluster the colour information by Euclidean distance metric. In this study, k is set to 3, so that exact lesion area, background and unaffected skin areas are clustered. From the clusters, skin lesion area can be easily separated from the other two clusters. Fig. 2 shows the results of the segmentation approach.
Step 2: Feature Extraction Pre-processing by Median Filter

Feature Extraction
Feature extraction aims to preserve the class discriminating information so that best class separation is achieved for least computational complexity. A classifier then uses these to decide whether the region is normal or abnormal. The advantage of extracting descriptors is that they will be a more compact representation of the segmented region than the image pixels alone if carefully chosen. The features are usually chosen based on the domain under consideration and in this study fall into three categories. The first group is based on the statistics of local property, and the second group consists of the parameters of the GARCH model of wavelet. Thirdly, spatial patterns of skin lesions are also recorded.

Figure 2:
Outputs of segmentation of the SMC system

Local Properties
There are four local properties extracted in this study, mean (μ), standard deviation (σ ), skewness (S) and kurtosis (K). They are computed from the central moments about the mean. The first moment, Mean (μ) is simply the total of pixel intensities (PI) divided by the number of pixels (n) in the dermoscopic images (Eq. (1)).
The second moment is called variance and its positive square root is called σ (Eq. (2)). It measures how much a PI in the dermoscopic images can be expected to deviate from μ. A low μ indicates that the PI is clustered about σ while high μ means the opposite.
Using Eqs. (1)-(4), the local proprieties are computed for each colour channel and stored in the feature database.

GARCH Features
To boost the performance of the SMC system, GARCH model is applied in the wavelet domain. The GARCH (p, q) for ε t is as follows [15]: and where α i and β j are GARCH parameters and these parameters are estimated using maximum likelihood estimation [15] for unit p and q. z t and σ t represents random variable and conditional standard deviation. These two variables are computed from a Gaussian distribution which has unit (1) variance and zero (0) mean.
At first, the dermoscopic images are transformed into DWT domain which is a powerful tool used in many pattern recognition techniques [22]. DWT is very useful to increase the accuracy of the SMC system as it provides localized frequency information. Fig. 3 shows a DWT decomposition of a dermoscopic image at two levels. While applying DWT on the dermoscopic image, it is observed that it provides four sub-bands by applying low pass and high pass filter in a predefined manner.
It is noted that the GARCH model is efficient only when the distribution of data has a heavy tail [23]. Thus, the coefficients' distribution in each sub-band in the DWT domain is tested for a heavy tail. It is achieved by calculating the K value using Eq. 4 which demonstrates whether they have a heavy tail or not. For a Gaussian distribution, K is three. If the K value of any data is greater than three, it indicates a heavier tail than the Gaussian distribution. Hence, the GARCH parameters are extracted from the sub-bands which have K of more than three.

DRLBP Features
DRLBP [24] is an extended version of LBP [9]. LBP features are rotational invariant due to the fixed order of the weights. To overcome this limitation, the weights are arranged based on a reference direction which is computed locally. It is defined as where i c and i n are intensity of central pixel and nth neighborhood pixels respectively. The DRLBP is defined based on D is given below.
It is evident from the Eq. (8), the weights depend on D and thus DRLBP satisfies the rotation invariant property. DRLBP gives the spatial pattern in the dermoscopic images. In this study, they are computed for each colour channel and stored in the feature database. The feature extraction is also used in [25,26].

Feature Reduction
A large set of features makes the classification system extremely computationally intensive. The complexity of the SMC system increases more when the combination of features used in the classifiers. Thus a feature reduction step is necessary to eliminate the poor performing features that affect the classifier's performance. The significant features are identified using t-test [27].
Let us consider features from two classes; c 1 of n 1 samples and c 2 of n 2 samples. A statistical value called as t is computed for a particular feature (f ) in the feature set is defined by where μ x (f ), and σ x (f ) are the mean and standard deviation of f of xth class. The application of Eq. (9) for all features produces t value. This value indicates the significance of features. They are sorted and features having high t values are selected.

Classification by Deep Learning
The non-linear relationship between the features of different classes can be modeled by neural networks which consist of input layers (number of features), hidden layer (normally 1) and output layer (number of classes). The information in the bracket shows the number of layers in each layer. The relationship between the features can be effectively modeled if the number of the hidden layer is increased. This is called deep learning [28] and in this study, it is applied for the classification.
The error between the actual and desired output is computed at first. The weights are updated iteratively while computing the error signal in the training phase. The update is done using the mean-squared error function. In the output layer, the error is multiplied using the sigmoid activation function. This process is stopped when the error is minimized at a predefined level by using the backpropagation algorithm. It is a descent algorithm that propagates the error from the output layer to lower layers. The weights are adjusted for the dampening oscillations with the help of learning rate and momentum factor so that the error rate is reduced in a decent direction. As the SMC system outputs a binary decision, a linear function is used in the output layer.

Results and Discussion
The developed SMC system is analyzed using publically available dermoscopic image databases; PH 2 [29,30]. It is extremely useful for the development and testing of any computerized skin cancer classification system. Also, they included ground truth data that describes the types of abnormal severity present in the dermoscopic images. The system is applied to classify the dermoscopic images into normal and abnormal categories in the first stage and then classified into benign or malignant in the second stage. The various details about the databases are listed in Tab. 1. The accuracy (A c ) of the system can be broken into two important measures; sensitivity (S n ) and specificity (S p ). Before defining these two terms, four more variables are to be computed from the SMC system's outputs. When the system correctly identifies a positive result, it is referred to True Positive (TP) and if the system incorrectly identifies a positive result, it is referred as False Positive (FP). Similarly, two more terms can be defined and referred to True Negative (TP) and False Negative (FN) for identifying negative results. The definition of sensitivity and specificity are as follows: There is no misclassification for a perfect system, which means that sensitivity and specificity will both be 100%. A high sensitivity measure can lead to a decrease the mortality rate. These measures are computed using k-fold (10-fold) cross-validation testing scheme where the classifier uses k-1 folds in training, and the remaining fold is tested.
As the GARCH features are extracted from DWT with many resolution levels of decomposition, the local properties and DRLBP are first analyzed independently. All normal images are considered a group of negative samples and abnormal images as positive samples in this stage. Then, k-fold cross-validation is employed for splitting images into these two groups for training and testing purposes. The confusion matrices obtained from these features are given in Fig. 4. It is evident from the Fig. 4 that the DRLBP features provide better performance than local properties of dermoscopic images. This is because DRLBP extracts spatial patterns that are available in the different classes of images effectively. It increases the performance by ∼10% than the performance of local properties in terms of sensitivity, specificity and accuracy. The GARCH parameters are extracted from different DWT Levels (DWT-L), and their performances are analyzed. Tab. 2 shows the performance of the 1st stage SMC system using GARCH features with DWT levels. It is observed that over 90% of accuracy is obtained by GARCH parameters extracted from the sub-bands of 3rd level DWT. It is well known that more information can be obtained when increasing the resolution levels. However, the features obtained from higher resolution levels reduce the system's accuracy due to the redundant data that can be seen at 4th level DWT features. Also, it is evident from Tab. 2 and Fig. 4, the GARCH features have better performance than others.
Applying the 1st stage SMC system is reasonable while using the features independently but insufficient in the medical field that requires more accuracy to decrease the mortality. The redundant features in each group, which affects the performance, are eliminated by a feature reduction approach to obtain more accuracy. Tab. 3 shows the performance of the 1st stage SMC system after the feature reduction approach.
After feature reduction, the highest performance is 99.17% sensitivity and 100% specificity for 2% selected features. With more features, both performance measures are reduced and thus the system select only 2% features from the combination of features as the best set to classify abnormal images. Fig. 5 shows the SMC system's accuracy for all possible feature set used in the 1st stage. Fig. 6 Confusion matrices of 2nd stage SMC system obtained using local properties and DRLBP respectively.  It is evident from the 2nd stage SMC system; DRLBP has a maximum specificity of 86.8% and sensitivity of 92.5%. Tab. 4 shows the performance of the 2nd stage SMC system using GARCH features with DWT levels.
The best features which perform better than any other GARCH features are extracted from the 3rd level. The sensitivity of 3rd level GARCH features is increased ∼5% than other features. Tab. 5 shows the 2nd stage SMC system's performance after feature reduction approach.  After feature reduction, the best performing features for 100% sensitivity and specificity are 2% features from the feature reduction approach. Fig. 7 shows the SMC system's accuracy for all possible feature set used in the 2nd stage. To visually analyze the SMC system, which uses three types of features and a combination of these features, ROC is used. Fig. 8 shows the ROCs of 1st and 2nd stage of SMC system. The classification is significantly better for the best-selected features (2% features) than others in both stages.

Conclusion
An efficient SMC system which combines segmentation, feature extraction, feature reduction and classification stages into one automated operation is developed and investigated for skin cancer diagnosis. The use of local properties, GARCH parameters from 3rd DWT level subbands and DRLBP to classify skin melanoma images is tested. Deep learning is tested using PH 2 database images and gives almost near-ideal system performance in terms of accuracy, sensitivity and specificity. Also, it is found that GARCH modelling can indeed be used for skin cancer diagnosis, and there are indeed performance differences in these features. The sensitivity of 1st stage and 2nd stage of the SMC system are 99.17% and 100% respectively, with all normal images are perfectly classified. The greatest degree of success is achieved on PH 2 database images using two stages of deep learning. It can be used as a pre-screening tool as it provides 100% accuracy for melanoma cases.

Funding Statement:
The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.