Scalable Skin Lesion Multi-Classification Recognition System

: Skin lesion recognition is an important challenge in the medical field. In this paper, we have implemented an intelligent classification system based on convolutional neural network. First of all, this system can classify whether the input image is a dermascopic image with an accuracy of 99%. And then diagnose the dermoscopic image and the non-skin mirror image separately. Due to the limitation of the data, we can only realize the recognition of vitiligo by non-skin mirror. We propose a vitiligo recognition based on the probability average of three structurally identical CNN models. The method is more efficient and robust than the traditional RGB color space-based image recognition method. For the dermoscopic classification model, we were able to classify 7 skin lesions, use weighted optimization to overcome the unbalanced data set, and greatly improve the sensitivity of the model by means of model fusion. The optimization and expansion of the system depend on the increase of database.


Introduction
Human skin is one of the most important organs of the human body. From the external environment, the skin is spread over the surface of the human body, accounting for about 15% of the adult's weight [Kanitakis (2002)]. Because the human body has the natural defense barrier of the skin, preventing most adverse effects such as bacteria and viruses, but skin diseases can threaten the safety of human skin. More than 5 million cases occur each year. A large part of these 5 million cases are caused by psoriasis, acne, and eczema. Caused by vitiligo, melanoma, etc., some of these skin diseases can be particularly fatal, such as melanoma [Pathan, Prabhu and Siddalingaswamy (2018)], and some may have a great impact on human appearance, such as vitiligo. Skin diseases also have differences in size. Some skin diseases such as vitiligo, psoriasis, etc. are directly visible to the human eye, convenient for observation and diagnosis, and some skin patients are difficult to observe, which requires physical means to enlarge, the appearance of dermatoscope makes this type Examination of lesions has become possible [Tschandl and Wiesner (2018)]. In short, the appearance of skin diseases can have a great impact on people's this means that we cannot achieve accurate classification of all skin diseases. However, this skin lesion classification system is scalable due to the mobility of the convolutional neural network.

Organization
The structure of this paper is as follows. In Section 2.1, we discussed the skin lesion dataset. In Section 2.2, we introduced the conditions of the device, pre-trained the model, and preprocessed the dataset. we achieve the division of dermoscopic lesion images and non-skin mirrors. In Section 3.1, we diagnose normal picture data and vitiligo lesion data for non-skin mirrors. In Section 3.2, we diagnose multiple types of dermoscopic lesion images. In Section 4, we show our system experiment results, in Section 5, we summarize the full text and put forward the outlook.

Datasets
In this article, we use a database of vitiligo from the hospital, a total of 38,677, image size of 768×576, including vitiligo and normal images, and dermatoscope dataset based on ISIC 2018: The Great Challenge Dataset for Skin Lesion Analysis to Melanoma Detection [Codella, Gutman, Celebi et al. (2018), Tschandl, Rosendahl and Kittler (2018)], a total of 10015, the image size is 450×600, includes 7 types of skin lesions such as melanoma, as shown in Fig. 1. The distribution of each data set is shown in the Tab. 1. For the dermatoscope dataset, it can be seen from the table that the amount of data between classes and classes is extremely uneven, and the imbalance of data volume may result in the model not learning the characteristics of the class with less data and the wrong classification. The problem we are going to try to solve.

Condition
First of all, our system needs to divide the skin mirror image and the non-skin mirror image. We will set the training set and test set according to 8:2. Because the image sizes of the two data sets are different, we will scale the size to 224×224 uniformly. The pretrained model on ImageNet uses a batch size of 20 training models on 4 TITAN X. In the data preprocessing section, we performed random horizontal/vertical flipping of the image, normalization, luminance perturbation, and our learning rate was set to 0.0001, a total of 20 batches were used in the divided training.

Expertiments
In our multi-dermatology classification system, we first use a two-classification model to determine whether the image is dermoscopic or non-dermoscopic. If it is a dermoscopic picture, it will be sent to the dermoscopic classification model of multiple diseases, otherwise it will be sent to the vitiligo classification model. In this paper, Resnet20 is used to distinguish dermoscopic and non-dermoscopic pictures. The result is obvious, the accuracy is 99%, and the robustness is very good. Next, we introduce the implementation of the two models. The process of building the system is shown in Fig. 2.

Vitiligo
According to incomplete statistics, the incidence of vitiligo in the world is about 0.5-2% [Krüger and Schallreuter (2012); Lotti and D'Erme (2014)]. The vitiligo is a common pigmented skin disease characterized by local or generalized loss of pigmentation and white spot formation. when vitiligo is still at an early stage, the number of vitiligo spots is small, the treatment effect is significant, the harm to human body is small, the treatment cost is low. Early diagnosis of vitiligo is very important, so automatic diagnosis of vitiligo is of great significance. In this paper, we propose a method base on three identical CNN models, trained with three different color-space images (RGB, HSV, and YCrCb) for the same vitiligo dataset. The result is better than a single model. In the classification system of vitiligo, we use CNN model. The structure of CNN usually consists of two parts: automatic feature extraction and full connection layer, in which feature extraction includes convolution layer and pooling layer. We can get a good classification model without through to manually selecting features, this is the advantage of CNN. The Keras-Framework [Chollet (2018)] is used in this experiment. It has many pretrained models, which can shorten the development time. This framework shows the powerful function of CNN and its simplicity. We choose four commonly (Resnet50, Vgg16, Xception, Iceptionv3) [Litjens, Kooi, Bejnordi et al. (2017)] used CNN models. After preliminary experiments, the Resnet50 has many indicators are better than other models, which is more suitable for classification of vitiligo data as shown in Tab. 2. This is because Resent model has residual module which can make the whole structure converge in the direction of identical mapping and ensure that the final error rate does not become worse and worse as the depth increases. Therefore, Resnet50 is used as the baseline model which is shown in Fig. 3. The formula of residual module as follows: where ϕ represent the relu function, i W represent the weight layer. The predictive value of vitiligo is averaged from the output proba-bilities of the three models, and the vitiligo recognition algorithm in detail as follows.
Step 1: Convert the RGB image into HSV and YCrCb image.
, , , Max b°°°°°°°°°= Step 2: We will use the data preprocessing method mentioned in the previous section to normalize the image and enhance the data, and then send the processed image to the Keras-Resnet50 model for training.
Step 3: Averaging the probability values of the corresponding classes output ( 1( ) F x , 2( ) F x , 3( ) F x ) from the three models, where x is stand for the input data.
Step 4: Diagnosis of vitiligo. We judge the image with a confidence greater than 0.5 as vitiligo.
, ( ) 0.5, 1, 2,3 Resu , , ( ) 0.5, 1, 2,3 lts = , We can see that the images in three color spaces and the trained models have similar indexes, but through the method of mixing, the indexes have been improved.

Dermoscopic
For relatively small skin lesions, it is difficult for doctors to detect by naked eye observation, and it is easy to misdiagnose. Because of the appearance of dermoscopy, doctors can diagnose as early as possible and prevent it in time. This paper designs a CNN model for assistant doctors, which can classify seven skin lesions under dermoscopy and reduce the misdiagnosis rate as much as possible. In the dermatoscopy classification system, we divide the data set into 8:1:1, which is divided into training set, verification set and test set. A total of 150 rounds of batches are trained. The other data preprocessing methods are the same as before, due to the skin mirror data. The amount of data in the set of 7 types is very different, so we use the weighted loss optimizer of the unbalanced class, and the weight is inversely proportional to the proportion of the class, so as to achieve the model to learn the characteristics of all classes equally.  2017)]. These models are selected to have a large depth and the parameter quantity is also large. Many of them help the model to learn better, and different models have different learning effects. The fusion model obtained later combines the advantages of various models, and the recognition accuracy will be higher. Taking Resnet101 as an example, the model is finetuned, and the zero-initialization method is applied to the BatchNorm layer in each residual module. The accuracy of the model is slightly improved. In addition, we turn the fullyconnected layer into a multi-layer perceptron structure, which leads to a significant increase in sensitivity. Next, in the validation set and test set, we use the data expansion method, and we use multiple cropping to obtain N images of different positions for validation, and average N results, we use the following formula to cut N copies around the original image center.
where [ ,0 We average the cropped N pictures as the result, by using the following formula: We compare N to none, 16, 36 and 64 cases, where the value of 36 is the best. Using this method, the performance is improved a lot compared to the various indicators obtained by using a single random cut. By cutting the validation set and the test set around the center into multiple copies, 36 images of different positions are obtained. The randomness obtained by multiple cropping is reduced, and the lesion image recognition is more accurate for some image lesions. According to this method, we obtain various metrics for different models on the validation set and test set, and calculate the mean. The results of DenseNet201, ResNet101, SENet154, SE-Resnext101, DPN68b and ResNet101MLP are obtained in turn, as Tab. 3 shows. From the results of these models, the best overall model is SENet154, the model with the best sensitivity is the resnet101 model with MLP structure which is shown in Fig. 4. However, we hope to get higher sensitivity results at the same time. we use model ensemble to combine the advantages of each model to get better results. Finally, for the results of validation set, we adopt an approach of ensemble combined with search strategy of Bayesian optimization. From the above model, we select the following five models for the ensemble, including DenseNet201, ResNet101, SENet154, SE-Resnext101, DPN68b and ResNet101MLP.The vectors are extracted before passing the softmax module, we connect all the vectors from the validation set through these five models, and train these vectors with RandomForest as a classifier. In the field of hyperparameter automatic search, there are common grid search, random search and beyesian optimization search, but the first two searches are not efficient, so we adopt bayesian optimization search, we will introduce the implementation process of Bayesian optimization Algorithm 1.

Algorithm 1 Sequential Model-Based Optimization
where f is the unknown function relationship, X is the input data, S is acquisition Function, M is model based on input data hypothesis, firstly we get the initialized data set based on the input data, then make a loop to select T times parameters. The model we chose is based on Gaussian distribution, the mean µ and covariance ( , ) K x x * of a Gaussian function are fixed, formulated as follows: ~( , ) f GP K µ , when the Gaussian process is used as a priori for Bayesian inference, the posterior function can be used to predict new data, we suppose y is a function value known by training data, y * is the function value of the test set input x * , µ is the mean of training set, µ * is the mean of test set, * ∑ is the covariance of the training set, ** ∑ is the covariance of the test set.
Gaussian process extends multivariate Gaussian distribution to infinite dimension, a training set y can be represented as a sample taken from a multivariate Gaussian distribution: 1 2 [ , , , ] T n y y y y =  . We set the mean of the Gaussian process to 0 and the most common choice for covariance is the squared exponential, See Eq. (6): Due to the existence of noise, we express the formula as Eqs. (7), (8): where ( , ) x x σ ′ is the KroneckerDelta function, in addition to calculating the covariance of the training set K , see Eq. (9), we also need to calculate the covariance between the new independent variable and the training set independent variable K * Eq. (10)   ~(0, ) The hyperparameter to be determined is 2 [ , ] f l θ σ = , since the training set obeys a multidimensional normal distribution, the likelihood function is Eq. (21): Bayesian optimization maps x to the real space R through the Acquisition function, indicating the probability that the objective function value of the point can be larger than the current optimal value. The two main types of acquisition functions are commonly used, the first is probability of improvement, see Eq. (22).
where ( ) f x is the value of the X objective function, ( ) f x + is the optimal X objective function value so far, ( ) x µ , ( ) x σ are the mean and variance of the objective function obtained by the Gaussian process, respectively, ξ is the trade-off factor which adjust to select the points around the X + . In general, we use MonteCarlo simulation method to find X so that ( ) POI X is the largest.
The second is expected improvement. The POI is a probability function, so only the probability that ( ) f x is larger than ( ) f x + is considered, and expected improvement is a desired function, so it is considered how much ( ) f x is larger than ( ) f x + . We get x by the following Eq. (23).
where Dt is the first t samples, under the premise of normal distribution, we can get the following Eq. (24): expression ( For the classifier model, we use SVM and RandomForest, SVM configuration is as follows, we use 10 fold cross-validation. For unbalanced data sets, we also use class balance weighted, we use sensitivity as the main evaluation indicator. Search parameters include C and kernel, the best performance parameters are thefollowing values: C=[0.1, 1000], kernel= ['linear', 'poly','rbf', 'sigmoid']. The parameter control for the SVM classifier is mainly from the following formula: According to validation set vectors where ( ) ( ) ( ) 0 C > is the upper bound, we change the value to get best results. We consider the classification model. The random forest model is a combination of the bagging model plus decision trees which creates multiple subtrees by splitting features. The difference is that decision trees usually generate nodes and rules by calculating the information gain and the Gini index. In contrast, random forests are random. Deeper decision trees tend to have over-fitting problems, while random forests can prevent most situations by creating random subsets of features and using them to build smaller trees, which then form subtrees, this method can prevent overfitting in most cases. In this experiment, we select the hyperparameter range to include the following values: N(the number of trees in the forest)=[10, 500], MinSamplesSplit (the minimum number of samples required to split an internal node)=[2, 100], MaxFeatures (the number of features to consider when looking for the best split)=[0.1, 0.999], MaxDepth (the maximum depth of the tree)= [5,80], to find the best sensitivity value by Bayesian optimization combined with random forest model.

Results
The performance of the classification is evaluated in terms of accuracy (ACC), precision (PPV), sensitivity (SEN), specificity (SPE) and the Area Under ROC Curve (AUC) [Powers (2011)]. The respective definitions of these common metrics adopting true positive (TP), true negative ( Fig. 5.

Discussion
In this work, we implement a multi-dermatological classification system. We first classify dermoscopic and non-skin mirrors (vital) and then construct models of dermoscopic and non-skin mirrors, respectively. For the vitiligo model, we propose an approach to recognition of vitiligo based on probability-average value of three CNN models which are same structures, compared with traditional images recognition methods based on RGB color space, the propose method is more effective and robust. For the dermoscopic model, we use weighted loss optimization for unbalanced data. In the validation and test set, multiple clipping methods are used. Finally, the validation set is connected through vectors of multiple models, and Bayesian optimization is used to search for hyperparameters to train a performance-optimized sensitivity classifier. The research on the classification of skin lesions is worthy of more trials, there are still many shortcomings in our work. In the future, we will do further research and make a progress on classification methods, and we are committed to helping doctors reduce the fatigue caused by diagnosis.