Multiclass Skin Lesion Classiﬁcation Using a Novel Lightweight Deep Learning Framework for Smart Healthcare

: Skin lesion classiﬁcation has recently attracted signiﬁcant attention. Regularly, physicians take much time to analyze the skin lesions because of the high similarity between these skin lesions. An automated classiﬁcation system using deep learning can assist physicians in detecting the skin lesion type and enhance the patient’s health. The skin lesion classiﬁcation has become a hot research area with the evolution of deep learning architecture. In this study, we propose a novel method using a new segmentation approach and wide-ShufﬂeNet for skin lesion classiﬁcation. First, we calculate the entropy-based weighting and ﬁrst-order cumulative moment (EW-FCM) of the skin image. These values are used to separate the lesion from the background. Then, we input the segmentation result into a new deep learning structure wide-ShufﬂeNet and determine the skin lesion type. We evaluated the proposed method on two large datasets: HAM10000 and ISIC2019. Based on our numerical results, EW-FCM and wide-ShufﬂeNet achieve more accuracy than state-of-the-art approaches. Additionally, the proposed method is superior lightweight and suitable with a small system like a mobile healthcare system.


Introduction
Skin lesions, which are irregular skin changes compared to the neighboring tissue, can evolve into skin cancer, one of the most dangerous cancers. There are two main types of skin cancer: nonmelanoma and melanoma. Melanoma lesions are responsible for the significant increase in mortality and morbidity in recent years; they are the most destructive and dangerous among various lesion types [1]. If the physicians detect the lesions earlier, they can increase the curing rate to 90% [2]. Moreover, visual inspection for skin cancer is complex because of high similarity among different skin lesion types (e.g., nonmelanoma and melanoma), leading to misdiagnosis. A solution for healthcare systems [3] and image inspection [4] is using the automatic classification of lesion pictures by machine learning (ML).
Presently, 132,000 melanoma skin lesion cases and approximately three million nonmelanoma skin lesion cases occur yearly in the world. Furthermore, 60,000 people died due to prolonged sun exposure (12,000 nonmelanoma and 48,000 melanoma), according to the World Health Organization. Approximately 80% of skin cancer mortalities occur with melanoma lesions [5]. Besides long sun exposure, a record of sunburn has been linked to the development of skin cancer, especially melanoma. In the beginning grades, patient survival rates can be improved if melanoma is identified correctly [6]. To handle the interobservation differences, technicians are guided to recognize melanoma manually. Consequently, an automatic classification system can enhance the precision and efficiency of the early discovery of this cancer type.

•
We propose a novel method to segment the skin image using the entropy-based weighting (EW) and first-order cumulative moment (FCM) of the skin image. • A two-dimensional wide-ShuffleNet network is applied to classify the segmented image after applying EW-FCM. To the best of our knowledge, EW-FCM and wide-ShuffleNet are novel approaches. • Based on our numerical results on HAM10000 and ISIC2019 datasets, the proposed framework is more efficient and accurate than state-of-the-art methods.
The remainder of the paper is organized as follows. We explore the related works in Section 2. In Section 3, we present the proposed method. Section 4 presents the numerical results and analysis. Finally, Section 5 presents the conclusion and future studies. Appl

Related Works
There are two strategies of skin classification: ML and DL methods [29].

ML Approaches
K-nearest neighbor (KNN) is a supervised ML algorithm used in predictive and forecasting models [30]. The accuracy of the KNN algorithm is considerably good [31]. Sajid et al. [32] proposed another KNN-based automated skin cancer diagnostic system. Their proposed system employed a median filter to remove the image noise using a collection of statistical and textural features. Textural features were extracted from a curvelet domain, whereas statistical features were extracted from lesion images. Furthermore, the proposed framework classified the input images into noncancerous or cancerous. The KNN model requires a long time to perform the output predictions, and it is unsuitable for a big dataset. Moreover, the KNN algorithm operates worst with improper feature information of high dimensional input data, making the algorithm unsuitable for the skin lesion classification [33].
Alam et al. [34] applied SVM to discover eczema. The approach in [35] manipulates several steps: image segmentation, feature determination applying texture-based data, and finally deciding the type of eczema with SVM. Upadhyay et al. [36] extracted orientation histogram, gradient, and location of skin lesion features. These features were fused and classified as malignant or benign using an SVM algorithm. The SVM algorithm is unsuitable for managing the noisy input image [37]. If the number of training samples is less than the number of feature vector parameters, SVM gives a lower performance.
The Bayesian algorithm is another approach used in skin lesion classification with multiple trained skin image databases [38]. Performing the Naïve Bayes algorithm for multiobjective areas is not easy [39]. The decision tree [40] model has been widely used for skin lesion classification, forecast of under limbs lesions, and cervical disease. Arasi et al. [41] presented intelligence techniques, decision tree, and Naïve Bayes to diagnose malignant melanoma. The extracted features are based on principle component analysis and hybrid discrete wavelet transform. These features become the input to different classification methods, such as decision tree and Naïve Bayes, for classifying the lesions as benign or malignant. The decision tree algorithm demands big training data for achieving considerable accuracy. Moreover, the decision tree model requires a large amount of memory and more computational time [42].

DL Approaches
There are two DL classification strategies for skin classification: non-segmentation [43] and segmentation approaches.

Non-Segmentation DL Approaches
Menegola et al. [44] applied six open datasets for lesion image classification. They used Google Inception-v4 and ResNet-101 architectures to detect seborrheic keratosis, malignant melanoma, and nevus. They also confirmed that combined datasets increase training data and improve classification accuracy. Han et al. [45] introduced Resnet-152 for the lesions image classification. The lesions include squamous cell carcinoma, basal cell carcinoma, actinic keratosis, intraepithelial carcinoma, and malignant melanoma. The factors that reduce the recognition accuracy of skin lesions are image contrast and ethnicity.
Esteva et al. [46] used Inception v3 architecture to classify the lesion into three groups. The method first distinguishes between benign and malignant and then separates seborrheic keratoses and keratinocyte carcinoma types. It also recognizes nevi and malignant melanomas.
Fujisawa et al. [47] classified skin lesions into 21 classes and introduced a four-level lesion identify method. Skin pictures are arranged in four levels using the GoogleLeNet model. Benign and malignant samples are classified first, followed by recognition of melanocytic and epithelial. The method also identified seborrheic keratosis, actinic kerato-Appl. Sci. 2022, 12, 2677 4 of 21 sis, Bowen, and basal cell carcinoma lesions. Zhang et al. [18] inserted an attention layer at the end of ResNet architecture and created new attention residual network. They classified seborrheic keratosis, melanoma, and nevus lesion.
Mahbod et al. [48] introduced a new method to recognize skin lesions using the finetuned pretrained network. They first applied AlexNet, VGG16, and ResNet to extract skin image features from the last fully connected layers. Then, they applied SVM to fuse these extracted features. Harangi [49] presented a method that fuses the outcome probabilities of the four CNN models: AlexNet, GoogLeNet, VGGNet, and ResNet. The study proposed the four fusion techniques to identify skin lesions: seborrheic keratosis, melanoma, and nevus. The sum fusion provides better results than other rules (simple majority voting, product fusion, and the maximal probability).
Nyiri and Kiss [50] presented the classification of the dermatological picture using various CNN models, such as VGG19, VGG16, Inception, Xception, DenseNet, and ResNet. They applied these models to extract features of two different inputs: the original skin and segmented images. The proposed method combined these two extracted features to predict skin lesions. Numerical results confirm that ensemble CNN has a better performance than single CNN in skin classification. Consequently, ensemble architecture outperforms individual architecture.
Matsunaga et al. [51] proposed an aggregate DL technique to match three lesion classes: seborrheic keratosis, melanoma, and nevus. They used two binary classifiers on the basis of ResNet-50 architecture. The first classifier distinguishes between melanoma and another lesion. Meanwhile, the second classifier identifies the relationship between seborrheic keratosis and another lesion. The proposed method recognizes the skin lesion images by combining the output probabilities of these two binary classifiers. Li and Shen [52] combined two ResNet architectures and obtained the features at the fully connected layer of each ResNet model. They combined the extracted features to classify skin lesions into seborrheic keratosis, melanoma, and nevus.

Segmentation DL Approaches
Gonzalez-Diaz [53] introduced three architectures to identify skin lesions: segmentation, structure segmentation, and diagnosis stages. First, the skin picture is segmented; then, the output of this stage is used as the input of the structure segmentation stage. Finally, the diagnosis stage links the outputs of the two previous steps and forecasts the skin lesion type. The generation of a labeled training database is the main challenge to creating a structure segmentation network in which each picture has an associated ground truth. This annotation is usually hard to obtain, as it demands a massive effort of the dermatologists to outline the segmentations manually.
The study in [54] employed a DL segmentation model (U-Net) to create a segmented map of the lesion image, cluster sections of abnormal skin, and give the output to a classification network. Meanwhile, Son et al. [54] drew contours of each cluster with the output mask created by U-Net and applied a convex hull algorithm to crop each cluster. Each cluster was applied as input to the EfficientNet to predict the lesion type.
Al-Masni et al. [55] presented an integrated framework that combines skin lesion boundary segmentation and classification steps. First, the study used DL full-resolution convolutional network (FrCN) to segment the skin lesion boundaries. Then, a CNN classifier (ResNet-50, Inception-v3, DenseNet-201, and InceptionResNet-v2) was used to classify the segmented skin lesions. The segmentation step is a crucial prerequisite for skin lesion diagnosis because it extracts principal features of different skin lesions. Table 1 presents the datasets of all DL methods mentioned in the related work section, including the number of images and classes. Three approaches test with the extensive databases (more than 10,000 images). Besides DL, other segmentation approaches achieved good results for different fields, such as surface defect detection and mineral separation. Truong et al. [56] presented an automatic thresholding approach that improves Otsu's method by applying an entropy weighting scheme and overcoming the weakness of Otsu's technique in defect detection. Zhan et al. [57] presented an ore image segmentation algorithm using a histogram accumulation moment applied to multiscenario ore object recognition. Ore pictures in three separate scenarios are used to demonstrate the effectiveness and accuracy of the proposed approach. It is reasonable to inherit these ideas into bioimage fields.
There are many methods for skin lesion classification in the literature review, including ML and DL frameworks. However, these methods suffer from one of the following drawbacks: (1) missing test results with big data, (2) having insufficiently good performance, and (3) heavy-weight model. In this study, we present a novel approach to overcome these limitations.

Methodology
We introduce the novel EW-FCM segmentation technique and a new wide-ShuffleNet for skin lesion classification. The segmentation step helps the network separate between the skin lesion object and background and boosts the recognition process. The segmentation results (full lesion image, including background and foreground) were used as the input of wide-ShuffleNet for feature extraction and classification. Figure 1 shows the structures of the proposed method. We explain how the EW and the first-order cumulative moment were combined to form the new EW-FCM segmentation technique and maintain their good characteristics in Section 3.1. Section 3.2 introduces the wide-ShuffleNet. Besides DL, other segmentation approaches achieved good results for different fields, such as surface defect detection and mineral separation. Truong et al. [56] presented an automatic thresholding approach that improves Otsu's method by applying an entropy weighting scheme and overcoming the weakness of Otsu's technique in defect detection. Zhan et al. [57] presented an ore image segmentation algorithm using a histogram accumulation moment applied to multiscenario ore object recognition. Ore pictures in three separate scenarios are used to demonstrate the effectiveness and accuracy of the proposed approach. It is reasonable to inherit these ideas into bioimage fields.
There are many methods for skin lesion classification in the literature review, including ML and DL frameworks. However, these methods suffer from one of the following drawbacks: (1) missing test results with big data, (2) having insufficiently good performance, and (3) heavy-weight model. In this study, we present a novel approach to overcome these limitations.

Methodology
We introduce the novel EW-FCM segmentation technique and a new wide-ShuffleNet for skin lesion classification. The segmentation step helps the network separate between the skin lesion object and background and boosts the recognition process. The segmentation results (full lesion image, including background and foreground) were used as the input of wide-ShuffleNet for feature extraction and classification. Figure 1 shows the structures of the proposed method. We explain how the EW and the first-order cumulative moment were combined to form the new EW-FCM segmentation technique and maintain their good characteristics in Section 3.1. Section 3.2 introduces the wide-ShuffleNet.

EW-FCM Segmentation Technique
In this section, we first present a short analysis of the Otsu technique. Then, we introduce EW and histogram accumulation moment, including FCM. Finally, the new image threshold technique for image segmentation is presented.
Otsu is one of the most commonly referenced thresholding methods. Let I = g(x,y) be an image with the gray value belonging to the interval [0, 1, . . . , L − 1]. Assign n i as the pixel numbers with the gray value i and the total pixel numbers in g(x,y) as N. The existence probability of the gray level i is given by Assume the threshold th (0 ≤ th ≤ L − 1) divides g(x,y) into two classes: background The medium gray level of individual class and the probability of category occurrence, respectively, are calculated as follows: where In Otsu's technique, the resulting threshold performance is measured by analyzing the difference between the background and foreground. The optimal threshold th * when applying this criterion must maximize between-class variance as follows: The basic idea to improve Otsu's technique is the addition of a weight W to the objective function in Equation (4) to regulate the output threshold, given as follows: Image entropy describes the properties of an image; it is a mathematical measure of randomness. Images with low entropy values possess minimal information and hold many pixels with similar intensity values. An image with zero entropy means that all pixels hold the same gray value. Reference [56] suggested an EW scheme by substituting weight W with the entropy objective function ψ(th) in Equation (5) to create a new objective function The entropy objective function is defined as follows: where Next, we discuss the first order of the cumulative moment. Let M O (th) denotes the FCM of the gray histogram, which is the mean gray value from 0 to th given by The mean gray of the entire image is M T , defined by The FCM M T helps the optimal threshold avoid dropping into the local optimum [57]. We combine the EW ψ(th) and FCM to obtain the optimal threshold and create a new objective function for image segmentation as follows: We adopt the segmentation process in reference [58], including texture filtering, threshold and binarize, and plot the boundaries. Zade [58] uses the Otsu method to calculate the threshold; meanwhile, our framework uses the new objective function in Equation (11) to determine the threshold. Our objective function provides a better segmentation technique due to the reservation of all properties of EW-FCM. Figure 2 shows the segmentation results of the original Otsu technique, EW scheme [56], and the proposed EW-FCM segmentation method. As seen in Figure 2, the proposed EW-FCM approach provides better segmentation accuracy than the original Otsu and EW scheme EWS. Next, we discuss the first order of the cumulative moment. Let ( ℎ) denotes the FCM of the gray histogram, which is the mean gray value from 0 to ℎ given by The mean gray of the entire image is , defined by The FCM helps the optimal threshold avoid dropping into the local optimum [57].
We combine the EW (th) and FCM to obtain the optimal threshold and create a new objective function for image segmentation as follows: We adopt the segmentation process in reference [58], including texture filtering, threshold and binarize, and plot the boundaries. Zade [58] uses the Otsu method to calculate the threshold; meanwhile, our framework uses the new objective function in Equation (11) to determine the threshold. Our objective function provides a better segmentation technique due to the reservation of all properties of EW-FCM. Figure 2 shows the segmentation results of the original Otsu technique, EW scheme [56], and the proposed EW-FCM segmentation method. As seen in Figure 2, the proposed EW-FCM approach provides better segmentation accuracy than the original Otsu and EW scheme EWS.

Wide-ShuffleNet
We provide a brief analysis of ShuffleNet inventing for portable devices. Several terms are reviewed, including efficient model designs, group convolution, channel shuffle for group convolutions, and ShuffleNet unit. We also introduce a new variant of ShuffleNet, called wide-ShuffleNet, that has been developed for skin classification.
Efficient model designs: Recently, efficient model designs played an essential role in achieving DL networks in many computer vision tasks [59][60][61]. The growing demand for streaming high-quality DL architectures on embedded systems boosts the research of effective model designs [62]. For instance, instead of assembling convolution layers, GoogLeNet [63] expands the network depth with sufficient lower complexity. ResNet in [64,65] achieves remarkable performance using the effective bottleneck architecture. SqueezeNet [66] preserves accuracy; however, it decreases computation and parameters significantly.

Wide-ShuffleNet
We provide a brief analysis of ShuffleNet inventing for portable devices. Several terms are reviewed, including efficient model designs, group convolution, channel shuffle for group convolutions, and ShuffleNet unit. We also introduce a new variant of ShuffleNet, called wide-ShuffleNet, that has been developed for skin classification.
Efficient model designs: Recently, efficient model designs played an essential role in achieving DL networks in many computer vision tasks [59][60][61]. The growing demand for streaming high-quality DL architectures on embedded systems boosts the research of effective model designs [62]. For instance, instead of assembling convolution layers, GoogLeNet [63] expands the network depth with sufficient lower complexity. ResNet in [64,65] achieves remarkable performance using the effective bottleneck architecture. SqueezeNet [66] preserves accuracy; however, it decreases computation and parameters significantly.
Group convolution: AlexNet [59] is the first model using the idea of group convolution and spreading the network across two GPUs, and ResNeXt [67] confirms the efficacy of group convolution. MobileNet [68] applies the depthwise separable convolution (DWConv) and achieves the best results between lightweight networks. ShuffleNet performs DWConv and group convolution (GConv) in a new style.
Channel shuffle operation: In the early studies on efficient network layout, the operation of channel shuffle is seldom noticed. Lately, another study [69] applied this concept for a two-stage convolution. However, the study [69] did not examine the efficacy of channel shuffle and its application in light network layout.
Channel shuffle for group convolutions: New DL architectures [63][64][65] consist of duplicated structure blocks with identical designs. Among these architectures, ResNeXt [67] and Xception [70] offer an effective GConv or DWConv toward the structure blocks to discover an outstanding trade-off among computational cost and representation capacity. However, both architectures do not use pointwise convolutions [68] (or one-by-one convolutions), which require significant complexity. Increasing the number of one-by-one convolutions, we must restrict the number of channels to meet the complexity constraint in a small network. One possible approach is using the channel links with GConv on the one-by-one layers to handle the limitation. GConv remarkably decreases computation cost by assuring that individual convolution runs on the relative input channel group.
ShuffleNet unit was invented for the light model, benefiting from channel shuffle functioning. ShuffleNet unit gets the ideal from bottleneck unit [64] (see Figure 2a in reference [61]). Building a ShuffleNet unit, the first one-by-one layer in the bottleneck unit is replaced by pointwise GConv accompanied by a channel shuffle functioning (see Figure 2b in reference [61]). The second pointwise GConv retrieves the dimension of the channel to match with the shortcut pathway.
There are two types of ShuffleNet units: nonstride and stride. Two modifications are made to create ShuffleNet with stride (see Figure 2c in reference [61]). First, a three-by-three average pooling is added to the shortcut path. Then, the elementwise addition is altered by channel concatenation, and it is simple to expand the channel dimension with a small additional computational cost. The ShuffleNet model is formed by the ShuffleNet units. The ShuffleNet architecture is made by stacking ShuffleNet units together and classified into three stages. The first block in every stage is implemented with a stride equal to two. Other parameters in the same stage stay identical, and the number of output channels doubled for the following stage.
The proposed wide-ShuffleNet develops from ShuffleNet units and the idea of skip connections. Now, we explore the skip connections. He et al. [64] introduced skip connections that bypass one or many layers (see Figure 2 in reference [64]). Skip connections create the fundamental unit of the residual network, which is considered a residual module [71]. It maintains the feature data over all layers, having longer networks while preserving low parameter numbers. Alternately of getting a desired mapping (indicated as H(x)), the network with skip connection gets a residual mapping (indicated as F(x) + x, with F(x) = H(x) − x). Skip connections make identity mapping and add the result to the output of the two convolutional layers (see Figure 2 in reference [64]).
Next, we apply the long variant of the skip connections to extend the width of the DL architecture. An architecture with a long residual connection converges more quickly and offers excellent performance (see reference [72]). A long residual connection helps the network increase the accuracy as it enhances reused features through the entire network. It also helps the network to get the detailed and general characteristics of objects. A oneby-one convolution layer is inserted in the shortcut connection to create a long residual connection between different sized layers, making the size of two inputs of the additional layer equal. Figure 3 shows the final architecture of the proposed wide-ShuffleNet. ShuffleNet 1, 5, and 13 are stride units, whereas the others are the nonstride units. In three skip connections, we use three skip convolution layers at the same kernel size of 1 × 1 to connect the input layers of ShuffleNet units: 1, 5, 13, and an output layer of the ShuffleNet units: 4, 12, 16.   Additionally, we insert the batch normalization (BN) layer after every skip convolution layer due to some reasons. DL training process becomes quick when applying BN. Increasing the deep of the network means that the training process gets more challenging because of many problems faced while training. The architecture provides greater test data precision with the BN layer than the original model (without this layer). BN decreases the internal covariate shift; thus, improving the performance of the network. The classification accuracy significantly increases when applying BN. The position of the BN layer is after the convolutional layer and before the leaky ReLU layer. This structure speeds up the training and decreases testing and training time [73]. Furthermore, we replace all ReLU layers in all ShuffleNet units with the leaky ReLU layers. Leaky ReLU gives better results than the ReLU activation function (see reference [74]).

Experiment
In this section, we present the numerical results.

Datasets
The dataset plays an essential role in evaluating the performance of the proposed framework. We test the proposed method on the two datasets: HAM10000 and ISIC 2019.
HAM10000 is the dermatoscopic database, which uses a benchmark dataset [75]. This database comprises more than 10,000 dermatoscopic pictures obtained from many people worldwide. The HAM10000 database also holds metadata formats like comma-separatedvalues data (.CSV), containing gender, age, and cell class. This dataset consists of seven different types of skin diseases: actinic keratoses and intraepithelial carcinoma (AKIEC), basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic Nevi (NV), and vascular lesions (VASC). The principal problem of the HAM10000 database is classes imbalance and the irregular distribution of skin disease numbers. NV class exceeds 70% of the total image numbers. This factor influences the training and creates an extreme imbalance database. The second large class is BKL, with approximately 13% of the pictures. The other classes contribute a minority number of the images. Especially, less than 2% of the total images belong to the DF class, which is the most difficult class for prediction. Figure 4 shows some sample images from the HAM10000 dataset. HAM10000 dataset is a part of the ISIC 2018 challenge with three tasks: lesion boundary segmentation (task 1), lesion attribute detection (task 2), disease classification (task 3).   The second dataset is the ISIC-2019 [76], consisting of 25,331 dermoscopic pictures belonging to eight categories: AKIEC, BCC, BKL, DF, MEL, NV, VASC, and squamous cell carcinoma (SCC). ISIC-2019 data are obtained from the following sources: BCN_20000 Dataset, HAM10000 Dataset, and MSK Dataset. The ISIC-2019 challenge has only one task: disease classification. Table 2 gives the class distribution of the two datasets, HAM10000 and ISIC-2019.

Evaluation
We apply the following metrics to evaluate the performance of the proposed method.
Speci f icity = TN TN + FP , where TN is the true negative, TP is the true positive, FN is the false negative, and FP is the false positive.

Implementation Details
The original HAM10000 and ISIC-2019 databases can be downloaded from the link in the Data Availability Statement. All tests were conducted on the i7 8700 PC, with 32 GB memory, GPU NVIDIA 1070, MATLAB (9.8 (R2020a), Natick, MA, USA). We apply the initial learning rate at 0.001, the mini-batch at 32, and the momentum at 0.9 for SGD for network training. After every 20 epochs, we divide the learning rate by half.

Comparison of the HAM10000 and ISIC 2019 Datasets
We evaluate the proposed method with two big datasets, HAM10000 and ISIC 2019. Then, we compare the performance of the proposed framework and that of state-of-theart approaches for the skin lesion classification task, including non-segmentation and segmentation approaches. We split two datasets into training and testing parts with the same amount mentioned in references [23,29,76]. The HAM10000 dataset is used in the first and second experiments, with the percentages of the testing parts being 20% and 10%, respectively. The third experiment utilizes the ISCI2019 dataset, with 10% of the total images belonging to the testing part. We implemented three different experiments to make a fair comparison between the proposed EW-FCM framework and other methods. The reason is that each method uses a different dataset and a different proportion of the testing data from the whole dataset. For example, Thurnhofer-Hemsi et al. [23] use 20% of the dataset as the testing data, while Srinivasu et al. [29] use only 10% for the testing data on the same dataset HAM10000. Table 3 presents the classification results of all approaches in three experiments. Table 3. Comparison with different methods on the HAM10000 and ISIC2019 datasets.
The first note is that the proposed method has a lower wrong classification than other methods. Additionally, the number of right predictions has improved for most categories: AKIEC, BCC, DF, and MEL categories. BKL, NV, and VASC are three experiments that our framework ranks as the third, second, and third positions among four approaches, respectively. We calculate the following metrics from the obtained confusion matrix: specificity, sensitivity, precision, and F1 score. Table 4, which is visualized in Figure 6, presents our proposed framework's performance and various approaches with five metrics: accuracy, specificity, sensitivity, precision, and F1 score (macro-average).

Experiment
Method ACC Experiment 1 HAM10000 (80% training, 20% testing) PNASNet [77] 76.00% ResNet-50 + gcForest [78] 80.04% VGG-16 + GoogLeNet Ensemble [79] 81.50% Densenet-121 with SVM [80] 82.70% DenseNet-169 [80] 81.35% Bayesian DenseNet-169 [81] 83.59% Shifted MobileNetV2 [23] 81.90% Shifted GoogLeNet [23] 80.50% Shifted 2-Nets [23] 83.20% The proposed method 84.80% Experiment 2 HAM 10000 (90% training, 10% testing) HARTS [82] 77.00% FTNN [83] 79.00% CNN [84] 80.00% VGG19 [85] 81.00% MobileNet V1 [68] 82.00% MobileNet V2 [86] 84.00% MobileNet V2-LSTM [29] 85.34% The proposed method 86.33% Experiment 3 ISIC 2019 (90% training, 10% testing) VGG19 [85] 80.17% ResNet-152 [64] 84.15% Efficient-B0 [87] 81.75% Efficient-B7 [87] 84.87% The proposed method 82.56%  In the second experiment, the HAM10000 database is split into two parts: 90% of the total images are the training part, and the rest are the testing part. None of the methods in Table 3 in the second experiment yield the precision, F1 score, and confusion matrix (micro-average metric). We use three metrics to evaluate the performance of various methods: accuracy, sensitivity, and specificity. Table 5 presents the outcomes of all approaches. Figure 7 visualizes the results of Table 5. Our framework achieves the highest performance in terms of accuracy and specificity. Meanwhile, the MobileNet V2 with long short-term memory (LSTM) [29] component has the highest sensitivity. MobileNet V2 improves efficiency more than the first version MobileNet V1. Consequently, the number of parameters decreases 19% from 4.   In the second experiment, the HAM10000 database is split into two parts: 90% of the total images are the training part, and the rest are the testing part. None of the methods in Table 1 in the second experiment yield the precision, F1 score, and confusion matrix (micro-average metric). We use three metrics to evaluate the performance of various methods: accuracy, sensitivity, and specificity. Table 5 presents the outcomes of all approaches. Figure 7 visualizes the results of Table 5. Our framework achieves the highest performance in terms of accuracy and specificity. Meanwhile, the MobileNet V2 with long short-term memory (LSTM) [29] component has the highest sensitivity. MobileNet V2  In the third experiment, we follow the dataset divided according to the previous study [76]. The authors in reference [76] tested the skin classification on the ISIC-2019 dataset with different transfer learning models, such as state-of-the-art architecture Effi-cientNet. EfficientNet has eight models that start from B0 to B7 and achieve better efficiency and accuracy than earlier ConvNets. EfficientNet employs swish activation instead of applying the ReLU function (see reference [76]). Table 6 compares all approaches. All methods provide only the accuracy metric, except the proposed methods. EfficientNetB0 uses a small number of parameters. It is the simplest of all eight architectures in EfficientNet. However, the total parameters of EfficientNet-B0 are approximately three times that of our method (5 M compared with 1.8 M), whereas the accuracy is lower (81.75% compared with 82.56%). EfficientNet-B7 and ResNet-152 are the first and second rank in terms of accuracy, respectively. Both architectures have high parameters (66 and 50 M, respectively) and achieved a better result than our method (the proposed network uses less than 4% of the parameters compared to these two models). VGG19 is the worst method with the highest parameters (143 M) and the lowest accuracy (80.17%).  In the third experiment, we follow the dataset divided according to the previous study [76]. The authors in reference [76] tested the skin classification on the ISIC-2019 dataset with different transfer learning models, such as state-of-the-art architecture EfficientNet. EfficientNet has eight models that start from B0 to B7 and achieve better efficiency and accuracy than earlier ConvNets. EfficientNet employs swish activation instead of applying the ReLU function (see reference [76]). Table 6 compares all approaches. All methods provide only the accuracy metric, except the proposed methods. EfficientNetB0 uses a small number of parameters. It is the simplest of all eight architectures in EfficientNet. However, the total parameters of EfficientNet-B0 are approximately three times that of our method (5 M compared with 1.8 M), whereas the accuracy is lower (81.75% compared with 82.56%). EfficientNet-B7 and  We have already compared different methods in three experiments. Even with the highest efficiency result, our method has a weakness that could not control the imbalanced classes of two datasets: HAM10000 and ISIC2019. Data sampling methods, which are the future research work, can balance the classes distribution.

Comparison with Segmentation Methods
The proposed method has higher accuracy and is more efficient than other nonsegmentation approaches. In this section, we compare the proposed EW-FCM with other segmentation techniques, such as non-DL and DL segmentation methods. All segmentation methods use the full lesion image (background and foreground) as the input image for the classification network.
First, we compare the proposed EW-FCM with other non-DL methods, such as the original Otsu, Otsu momentum, an EW scheme. Table 7 presents the results. EW-FCM achieved the highest accuracy among non-DL segmentation approaches. Figure 8 shows the segmentation results of various segmentation methods. result than our method (the proposed network uses less than 4% of the parameters compared to these two models). VGG19 is the worst method with the highest parameters (143 M) and the lowest accuracy (80.17%). We have already compared different methods in three experiments. Even with the highest efficiency result, our method has a weakness that could not control the imbalanced classes of two datasets: HAM10000 and ISIC2019. Data sampling methods, which are the future research work, can balance the classes distribution.

Comparison with Segmentation Methods
The proposed method has higher accuracy and is more efficient than other nonsegmentation approaches. In this section, we compare the proposed EW-FCM with other segmentation techniques, such as non-DL and DL segmentation methods. All segmentation methods use the full lesion image (background and foreground) as the input image for the classification network.
First, we compare the proposed EW-FCM with other non-DL methods, such as the original Otsu, Otsu momentum, an EW scheme. Table 7 presents the results. EW-FCM achieved the highest accuracy among non-DL segmentation approaches. Figure 8 shows the segmentation results of various segmentation methods.   Second, we compare the proposed EW-FCM with DL segmentation methods. Al-Masni et al. [55] provides only the accuracy of the training of the ISIC 2018 task 3 (HAM10000 dataset). Moreover, Al-Masni et al. [55] uses DL FrCN to segment the skin lesion and classify full lesion images with various networks, such as Inception-ResNet-v2, DenseNet-201, and Inception-v3. Meanwhile, Son et al. [54] crop the segmented image with U-Net and input the result to Efficient-B0 for classification. We adopt the idea of U-Net and Efficient-B0 in [54] to evaluate the HAM10000 dataset using the U-Net as a segmentation method with full lesion image (background and foreground). EW-FCM achieved a lower accuracy than the DL segmentation methods (see Table 7); it also achieved higher results than non-DL segmentation methods. There are two main drawbacks of the DL segmentation methods. First, creating a labeled training database is the primary challenge to creating a DL segmentation network in which each picture has an associated ground truth. A massive effort of dermatologists is used to obtain this ground truth pixel-wise segmentation. Meanwhile, the proposed EW-FCM uses a threshold technique for image segmentation and does not depend on the ground truth. As a result, we cannot calculate the quantitative analysis with the EW-FCM segmentation technique (such as using the Jaccard index). Second, the complexity increases when using DL for segmentation and classification. For instance, the total parameters of the DL segmentation U-Net and DL classification Efficient-B0 are 12.7 M (7.7 M + 5 M = 12.7 M). EW-FCM uses DL only for the classification state, thus decreasing the complexity and being suitable with a portable system.
Next, we evaluate the performance of the skin classification with various classifiers and present the results in Table 8. The proposed method improves the skin classification for two reasons: wide-ShuffleNet is better than ShuffleNet, and the new segmentation technique is better than the original image. Efficient-B0 has the highest accuracy, but its parameters increase approximately three times compared to wide-ShuffleNet. In future studies, we will investigate the performance of EW-FCM with ShuffleNet V2 (2.4 M parameters) and compare it with the proposed wide-ShuffleNet (1.8 M parameters).

Conclusions
Skin cancer is one of the most dangerous diseases in humans. The automated classification of skin lesions using DL will save time for physicians and increase the curing rate. Typical DL frameworks require high parameters and cannot work on the mobile system. Hence, developing the lightweight DL framework for skin lesion classification is essential.
In this paper, we propose a novel method for skin lesion classification. Our lightweight method improves the limitation of evaluating a small number of skin lesion images in the past. The numerical results show that the proposed framework is more efficient and accurate than the 20 other approaches (see Table 3). The proposed method reduces the number of the parameters to approximately 79 times that of another method (VGG19) while maintaining higher accuracy. Additionally, the proposed method achieves higher accuracy than other non-segmentation and non-DL segmentation methods and the approximate results at the level of the DL segmentation methods while reducing the complexity of the DL segmentation methods. Our framework does not require ground truth for image segmentation, whereas DL segmentation methods cannot work without ground truth. Thus, the proposed method decreases the effort of the dermatologists to manually outline the ground truth pixel-wise segmentation. We create an accurate and efficient framework by combining the new EW-FCM segmentation technique and wide-ShuffleNet.
We will compare the proposed framework with more networks in future work. Another future direction for research and development is to integrate the proposed method into real-world problems like the mobile healthcare system.