Skin Cancer Detection Based on Extreme Learning Machine and a Developed Version of Thermal Exchange Optimization

Melanoma is defined as a disease that has been incurable in advanced stages, which shows the vital importance of timely diagnosis and treatment. To diagnose this type of cancer early, various methods and equipment have been used, almost all of which required a visit to the doctor and were not available to the public. In this study, an automated and accurate process to differentiate between benign skin pigmented lesions and malignant melanoma is presented, so that it can be used by the general public, and it does not require special equipment and special conditions in imaging. In this study, after preprocessing of the input images, the region of interest is segmented based on the Otsu method. Then, a new feature extraction is implemented on the segmented image to mine the beneficial characteristics. The process is then finalized by using an optimized Deep Believe Network (DBN) for categorization into 2 classes of normal and melanoma cases. The optimization process in DBN has been performed by a developed version of the newly introduced Thermal Exchange Optimization (dTEO) algorithm to obtain higher efficacy in different terms. To show the method's superiority, its performance is compared with 7 different techniques from the literature.


Introduction
e skin cancer has been known as one of the deadliest types of cancers which has recently witnessed an exponential growth worldwide. With all these interpretations, it can be cured if it is detected in the early stages, and, in most cases, a simple biopsy can prevent cancer from growing [1]. is cancer has been growing dramatically during the recent years and the importance of the initial therapy has been more to be seen. e melanoma skin cancer has been ranked as the 3rd most common malignant cancer among the skin cancer cases [2]. is cancer alters the skin color as a result of the irregular effects of the cells [3]. Notwithstanding this danger, it is known as a curable cancer if it is detected in the early stages. With all these interpretations, early detection of the melanoma from the other types of the benign skin moles is a thought-provoking mission. Any variations in the size, color, or the shape of a skin mole can be considered as initial signs for this cancer.
Based on the statistical information for 2019, skin cancer with 15000 cases is ranked as the fourth most common cancer, and with 1900 deaths cases are ranked as one of the 10th common cancers in the world [4]. e most common method for melanoma detection is by the physicians and their sampling and testing. However, in some cases, even experienced specialists make some mistakes in melanoma detection. Furthermore, laboratory sampling and testing is a time-consuming and expensive experiment that bothers the patients. erefore, designing a (computer) system that can detect malignant lesions will be very useful. According to researches, the effectiveness of the computer-aided system is more accurate than that of a specialist doctor. Recently, many research works are accomplished for automatic early detection of skin cancers. Also, the high capability of the medical imaging assists the medical doctors to lessen the diagnosis difficulty of this work and to enhance the speed of the initial diagnosis of the disease. e application of Artificial Neural Networks (ANNs) in different parts of image processing and machine vision has been increasing. e ANNs are imitated from the human brain actions and are used in a variety of applications, from medicine to economics and engineering [5]. e ANN is made up of lots of highly unified processing components called neurons which can solve problems with their interconnectedness. Artificial Neural Networks, like humans, learn by example to solve different kinds of problems from pattern recognition to information classification throughout a learning procedure.
Deep learning is a sort of ANNs which uses mathematical techniques to give a structure like the human brain. On the other hand, advances in technology have led to algorithms for optimizing normal neural networks so that we can count the number of neural layers. Neural networks are increased from multiple layers to thousands of layers and thousands of neurons in each layer, which until a few years ago could not have created such a structure [6]. is type of neural network is called a deep learning network. Several methodologies of using different kinds of ANNs and other machine learning techniques were introduced for skin cancer diagnosis [7,8]. For instance, Li et al. [9] proposed a new data synthesis methodology to combine individual skin lesions images and heavily expand them to create noteworthy amounts of data. e study worked on a convolutional neural network (CNN) to provide superior efficiency to the old-style detection and tracking methods. Moreover, the system was trained by humans with simple criteria.
Esteva et al. [10] presented a supervised technique for skin lesions using deep learning. e method's efficiency was verified by twenty-one medical images with two critical binary classifications: malignant and benign groups. Two cases were analyzed here. e first one recognizes the most common cancers, and the second case signifies the deadliest skin cancer identification. e final results indicated the high performance of the proposed method.
Jafari et al. [11] proposed a deep learning-based procedure to provide a precise skin cancer area segmentation. After image denoising and preprocessing, a CNN was utilized for segmentation. ey combined local and global structure information and outputted a label for the pixels to create a segmentation mask that indicates the lesion area.
is mask was then improved with some postprocessing operators. e empirical results indicated the superiority of the presented method compared to the other techniques.
Mohamed et al. [12] proposed a learning technique to categorize skin lesions to determine melanoma. e method was based on applying a CNN with multiple outlines. e results of the suggested technique were performed by 14 layers into the International Skin Imaging Collaboration (ISIC) dataset. Final results indicated successful ratio for the method in the segmentation of the skin cancer area.
Xu et al. [2] proposed an automatic diagnosis method for melanoma. e presented technique was based on an optimized image segmentation methodology along with a CNN optimized by satin bowerbird optimization (SBO). After segmentation, feature extraction was implemented on the processed image. e main features were then selected using the SBO algorithm to trim additional information. At last, the results were presented by performing an SVM into two groups of normal and melanoma cases. e method's results were authenticated by comparison with other techniques from the literature to indicate its performance.
As the research background shows, the application of deep learning, including CNNs in medical images processing, especially in skin lesion diagnosis, has been exponentially increasing. e present study proposes an optimized technique of skin lesion segmentation using a new design of the ermal Exchange Optimization algorithm which has been used to reach this purpose with higher performance. e overall technique for the proposed technique has been shown in Figure 1. erefore, the main contributions of the present study can be highlighted as follows:

Skin Cancer Preprocessing
Medical imaging is the method and process used to create images of the human body to examine or advance medical science and is widely used in medicine [13]. Diagnostic images are the subject of many scans, examinations, and images used in medicine. Researches have shown that medical images taken are highly susceptible to vulnerability to noise and other factors such as low contrast. Here are two simple ways to do this purpose.

Noise Reduction.
Images noise removal is a significant part of all medical image processing. is process tries to reduce or remove the noise from the original image. Features that expose them to noise are available on all recording devices [14]. e noise may be white or random noise. Noise reduction recreates and restores the noisy image to its original model based on ideal models in which we analyze how this method is done. e noise has a major impact on the image in certain situations, particularly through the identification of the image edge that needs differentiation which increases the influence of high-frequency pixels that specifically contain noise [15,16]. e noise in medical images is typically Gaussian noise. Gaussian noise is dispersed in the image and is located between the image main pixels, which makes some inconsistencies for the next operations, and because the processing must be done only on the image main pixels, it requires a proper noise elimination for this purpose. Noise pixels are usually recognized as a different pixel among a group of 2 Computational Intelligence and Neuroscience pixels in the original image that differs from adjacent pixels in different terms such as light intensity and transparency. Wang-Mendel algorithm is an efficient fuzzy-based noise removal tool [17]. Because of the simple conception of the fuzzy techniques, they are popular for different applications. Furthermore, because of their speediness as an early-stage process, they are very useful for producing the initial fuzzy model [18]. In this method, the input and the output datasets show the behavior of the solved state of the problem. e production of the rule database is performed as follows: (1) Consider fuzzy isolation from the space that includes the input variables, which is achieved by the knowledge of an expert along with normalization. en, it splits it into equivalent or unequal sections by doing a fuzzy-based method for parting the space of input variable. en a form of the membership function is chosen and each component is given as a fuzzy package. Afterward, the membership and fuzzy set has been established to all the parts.
(2) A set of candidate language rules are produced, which are generated by choosing the most inclusive rules for the samples. (3) Validity degree is assigned for the laws, which is achieved by multiplying the membership function values for all the components. (4) e final rules database is obtained from the rules of the candidate language set, which is done by categorizing these rules into some groups, where the groups include the rules of candidates with the same assumption. e final rule is obtained by selecting the highest degree for all the groups.

Image Contrast Enhancement.
Dermoscopy images typically lack high image contrast, which makes them impossible to process next stages. e concern arises from multiple circumstances, such as the poor quality of measuring instruments and cameras, the low degree of the user interface in photography, the environmental factors, and the prevalence of noise. Sometimes, any crucial details from the images disappear from the above examples, rendering the processing too complex [19]. Contrast enhancement is a method for addressing the problems of contrast consistency. In this research, image contrast enhancement is used to enhance and help demonstrate the specifics of the cancer regions. Global enhancement of contrast using the lookup table is used in the present research. For categorizing and storing the received images on the disk, an 8-bit lookup table is used. e strategy is usually developed as follows [20]: where PDF In and PDF out represent the probability density function of the input and the output corrected images, respectively, and PDF min and PDF Max represent the minimum and the maximum probability density levels, respectively. Figure 2 shows an example of the preprocessed image for more clarification.

Color Space.
e RGB is the most common and basic color space in images processing, which usually is used for displaying images. Due to the dependence of these three colors, which are considered as base colors in the RGB color space, as well as their dependence on the intensity of ambient light, this color space is less used for main processing. To compensate for this shortcoming, the XYZ color model has been employed. is color space is a quantifiable link between the physiologically supposed colors in human color vision and wavelengths distributions in the electromagnetic visible spectrum . In this color space, the values of X and Z give color information, and Y defines the luminance. To convert color space from RGB to XYZ, the following equation has been used [21]: e most significant benefit of XYZ color model is that it is completely independent of the device.

Method of Segmentation.
In dermoscopy images, the red color space is the only dimension in RGB color space that gives nearly the strength of the images. As mentioned before, both X and Z values give related information from the color for the XYZ color space. erefore, the red (R) dimension and the X dimension are just normalized for the segmentation; that is [21], On any pixel of the input images, these normalization values are performed. After this normalization, the Otsu thresholding is used for giving a low-cost segmentation in terms of time complexity. e global threshold has a problem when the image background resolution is inadequate. A local threshold can be used to eliminate the heterogeneity effect. is problem is solved by image preprocessing to eliminate inhomogeneities and apply a global threshold to the processed image. Based on Otsu technique, threshold level is comprehensively searched to provide minimum in-between variance class, which is expressed as follows: where ω i denotes the probability of two separate classes with a threshold value of t. σ 2 i in this case is the value of the variance of these classes. In fact, Otsu shows that minimizing the amount of variance in a class is like maximizing interclass variance: where v 2 i describes the variance and m i signifies the mean value that can be updated intermittently. is technique is briefly given as follows: (i) Evaluate image histogram and its intensity level probability (ii) Give initial values of ω i (0) and v i (0) for all levels of possible threshold (t � 1, 2, . . .
. en, a morphology operation including closing, opening, and filling is used for better results. In this stage, by applying the mathematical filling operator, the extra image holes are filled. is operator is as follows [22]: where A and B represent the area and the constructing component, respectively. en, the opening operation has been established on the filled image for eliminating the ignitor particulars with no changes on other gray surfaces. is operator is as follows: Finally, for connecting the narrow parts, the closing operation is performed based on the following equation: Here, the structural element in this study is set as a 5 × 5 identity matrix. Algorithm 1 shows the pseudocode of the image segmentation. Figure 3 displays some samples of image segmentation using the suggested segmentation.

Features Extraction
Feature extraction is the process of collecting more detailed information from the image. e features extraction in this part of processing is utilized for extracting the main features of the segmented skin cancer area to simplify the diagnosis successful. ere are numerous features for feature extraction. is research uses statistical features and texture features, and geometric features have been adopted for this purpose. Table 1 gives the features utilized here.
In the above table, MN describes the size of image, B p describes the length of external side in the pixel boundary, p(i, j) represents the intensity values of the pixels at position, μ defines the mean value, σ describes the standard deviation, and a and b define the major axis and the minor axis, respectively. However, some achieved features are not useful for the process and just increase the method's complexity. To prevent this issue, a feature selection procedure has been utilized. To do so, in this study, an optimized technique has been used. For giving optimal results of features selection, the following cost function has been considered: where F P describes the false positive, F N describes the false negative, T P defines true positive, and T N defines the true negative. e idea is to minimize the above equation based on a developed optimization method.

Extreme Learning Machine (ELM)
One of the neural network models which has recently received many considerations is the ELM model. Some of the advantages for ELM are their high speed in learning, their simple using, and their ability to be used in numerous activation functions and nonlinear kernel functions. e ELM model can provide an integrated template with a wide variety of feature transmissions that can be used in the hidden layer, which can be utilized directly for regression and categorization. is algorithm is a learning method for Single Hidden Neural Networks (SHNN) based on initializing the input biases and weights randomly and the output weights evaluation. us, the network can be trained in just a few steps. Figure 4 shows the arrangement of a simple ELM network.
For a classification problem with D dimension and N number of training samples [23], where x (n) ∈ R D and t (n) ∈ R K . An ELM-based feed-forward neural network can be formulated as follows:  is conception is given as follows: where Meanwhile, H defines a nonsquare matrix. e training samples quantity is bigger than the hidden neurons quantity. erefore, to solve this problem, the following formulation has been used: where H † describes the generalized Moore-Penrose matrix inverse. us, the ELM networks can be shortened to three main steps: (1) Initializing the input weights and bias with random values (2) Evaluation of the hidden neurons' outputs matrix H (3) Evaluation of the hidden neurons' outputs weights matrix by equation (16)

The Developed Thermal Exchange Optimization Algorithm
Optimization is the process of using specific techniques to solve the optimization problems. However, in recent years, the performance of the classic optimization methods due to increasing the complexity and nonlinearity of them has been decreased or even failed [24]. Recently, several methods have been introduced for resolving these problems [25]. Metaheuristics are one class of these methods that have been popular due to their simple and suitable results in various problems. e metaheuristic algorithms are derived from various phenomena from nature, human society, animal hunting behavior, and so forth. Several algorithms have been proposed in this field [26][27][28][29], for example, World Cup Optimizer [30], Ant Lion Optimizer (ALO) [31], Chimp Optimization Algorithm [32], Harris Hawks Optimization [33], and mayfly optimization algorithm [34]. In the present research, a novel modified model of the ermal Exchange Optimization (TEO) algorithm has been presented to achieve optimal results for the considered methodology [5].
e TEO algorithm is a novel metaheuristic technique that is inspired by the temperature of the objects and their position that is switched between cold and warm places, indicating the updated positions. In the following, more explanation about this algorithm has been provided.

e Newton Cooling Law.
e tests indicated that the rate of cooling was roughly related to the temperature difference between the environment and the heated object.
is is formulated as in the following equation: where A describes the area of the body surface that transmits heat, Q signifies the heat, α defines the coefficient of heat transfer that is reliant on different cases like object geometry, heat transfer mode, and surface state, and T b and T s represent the temperature of the body and the ambient temperature, respectively. e heat loss in time dt is α × A × (T s − T) dt, which states the variation in kept heat as the temperature drops dT; in other words, where d signifies the density (kg/m 3 ), c defines the specific heat (J/kg/K), and V describes the volume (m 3 ). So, where T M describes the initial high temperature. Equation So, by considering a constant ζ, Figure 4: A sample configuration of the ELM network. 6 Computational Intelligence and Neuroscience So, consequently, by reformulating equation (19), 6.2. e Algorithm. In ermal Exchange Optimization, some of the candidates have been assumed as the cooling objects and the other remaining candidates are assumed as the environment; afterward, the reverse operation has been done. In TEO algorithm, like any other metaheuristic algorithm, the candidates are first generated randomly. So, the initial temperature for each object is formulated by the following equation: where rnd signifies a random vector limited between 0 and 1, T 0 i describes the initial solution vector of the i th object, andT min and T max represent the minimum and maximum limitations of the candidates. e randomly generated candidates are then performed to the cost fiction to evaluate their cost value. After that, the position of some best T candidate vectors is saved as ermal Memory (TM) to be utilized then for improving the algorithm's efficacy by less computational cost. Afterward, some best TM candidates have been given and the equal numbers of population with the worst values have been eliminated. e candidates are separated into two equal types that are shown in Figure 5.
For more clarification, T 1 determines the environment object for T n/2+1 cooling object, and, inversely, if any object has lower value than ζ, the temperature exchanging is performed slowly. erefore, ζ will be achieved by the following formula: . (22) Another important term in the algorithm is time, which is associated with the iteration quantity. is parameter is formulated in the following: To give an exploration term to the algorithm, the environmental temperature changing has been modeled, which is given in the following equation: where m 1 and m 2 describe the control variables and T ' e i signifies the object earlier temperature which has been modified to T e i . By considering the former models, the new temperature of the objects is updated as follows: e next term is Pr, which is used to indicate whether the cooling objects' element should be altered or not. e Pr components are compared with R(i) (i � 1, 2, . . . , n), which has a randomly distributed value between 0 and 1. If R(i) < Pr, the i th candidate will be randomly chosen as 1-dimensional and the value is reformulated by the following: where T i,j determines the j th variable of the i th candidate and T min j and T max j are the minor and the major limitations of the variable number j, respectively. At last, the algorithm is stopped when stopping criteria are checked to terminate the algorithm.

Developed TEO Algorithm.
Although TEO algorithm has numerous benefits in providing the best global solution, it also has some drawbacks which could be resolved. e main drawback of the algorithm could be its local minimum results and low convergence in some cases. Numerous improvements were introduced for resolving the exploration of the metaheuristics. In this research, chaotic theory has been utilized as the first improvement. e reason for using chaos in this algorithm is to improve the algorithm's efficiency to resolve the local optimization followed by an advanced speed of convergence. In the considered dTEO algorithm, the modified randk is replaced with the "rnd," i.e., where k signifies the number of iterations. e other improvement for the algorithm to give a proper trade-off between exploration and exploitation in the algorithm using the Gaussian mutation mechanism is formulated below [35]: where σ 2 signifies the variance of the Gaussian PDF and μ determines the Gaussian distribution expectation. is difference is applied to the considered location by renewing the TEO algorithm (T new i,j ) as follows: where c describes a decreasing random value between 0 and 1 and g(0, 1) describes the standard Gaussian distribution.  Computational Intelligence and Neuroscience 6.4. Algorithm Authentication. In this section, the proposed developed ermal Exchange Optimization (dTEO) algorithm that is stated in the previous section has been programmed by the MATLAB 2017b environment and then validated based on a 16 GB RAM Core ™ i7/4720HQ 1.60 GHz processor. e proposed dTEO algorithm validation is based on applying it to some standard test cases. e results are also compared with some other approaches, that is, Multiverse Optimizer (MVO) [36], Locust Swarm Optimization (LSO) [37], Spotted Hyena Optimizer (SHO) [38], and the original TEO [39], to show the proposed method's efficacy. Table 2 shows the details of the studied functions.
In Table 3, the parameters settings of all studied algorithms for use in this study are given below.
For determining the performance of the algorithm, they performed on the benchmark functions and two measurement indicators including the minimum value and the standard values are extracted from the results to show the algorithms' accuracy and reliability. It should be noted that, for all algorithms, the population size and the number of iterations are set as 120 and 100, respectively. e simulations have been repeated 20 times to give consistent results. Table 4 illustrates the simulation results.
As can be seen from Table 4, the suggested dTEO has the least value of the minimum value for the minimization problems. is means that the suggested method provides the best accuracy toward the other state-of-the-art methods for solving the studied functions. Plus, based on the results achieved for the standard deviation, it is definite that the suggested method has the minimum value of that. is shows that using the proposed method has the highest reliability against the comparative techniques in this study. So, we can conclude that using this algorithm in this study is a good idea in terms of accuracy and reliability.

Optimized ELM Network Based on Algorithm
e classification of the final extracted features is performed based on the proposed optimized ELM network. e main task of the ELM classifier here is to determine the skin cancer dermoscopy image into two groups of normal and malignant melanoma classes.
In this study, activation functions of the network are considered for enhancement. Generally, the significant effect of activation function on the ANNs is too clear, because a network with various activation functions gives good generalization capacities. is study considers a sigmoid activation function as one of the popular functions in this area. e formulation for this function is explained as follows: g(X) � 1 1 + exp(− (as.x + bs)) .
For improving the efficiency of the ELM-based classifier, different sigmoid functions have been utilized in the hidden neurons. In other words, different values are utilized foras and bs. Figure 6 displays the impact of as and bs on the function shape.
In other words, the problem of finding the suitable parameter of the sigmoid function can be considered as an optimization problem. Here, we used a developed ermal Exchange Optimization algorithm to refine the issue. However, both "as" and "bs" have almost equal effect on the function shape; we used the "bs" for optimization of the network. erefore, two optimization policies have been used. e input weights have been first initialized. en, the developed ermal Exchange Optimization algorithm is utilized for optimal parameters selection by assuming the initial weights. e objective function of the developed TEO is based on the square error between the output of the desired value and the network by the following: where n and k describe the numbers of training samples and the output layers, respectively, and y ji and d ji represent the output of the network and the desired value, respectively. e primary target of the present study is to utilize the mentioned optimized ELM network for diagnosis of the skin cancer. e optimization is using a newly designed metaheuristic, namely, the developed ermal Exchange Optimization algorithm.

Results and Discussion
In this paper, a new optimized methodology is used to detect the skin cancer from dermoscopy images.       In the presented dTEO-based ELM classifier, 80% of data are set to train data, and the remaining 20% are set to test the data. e training step of the network is assumed by 700 iterations. e training process has been iterated 15 times to achieve a confidential result. e validation has been performed using 5 measurement indexes: specificity, sensitivity, accuracy, PPV, and NPV, which are given as follows:    e results are compared with 7 other methods, fractal analysis [40], CNN [41], Delaunay Triangulation [42], Sideby-Side method [43], Genetic Algorithm [44], fusion method [45], and SVM [46], to verify the method's performance in different terms. Table 5 tabulates the performance validation for the suggested technique compared with other methods. e results show the mean value of 30 runs for the methods.
As is concluded from Table 5, the suggested method with 95% accuracy provides the uppermost precision among the other compared techniques. Also, with 95% of sensitivity, which is the highest against other methods, the reliability of the proposed method has been proved. is can be proved also for NPV, PPV, and specificity toward the others. e better results of NPV and PPV indicate higher commonness of the condition to determine the probability of a test diagnosing cancer. Also, better results of the specificity and sensitivity for the suggested technique show higher prevalence-independent results for the algorithm. For more clarification, the results of Table 5 are shown in Figure 8.

Conclusions
In recent years, skin cancer has been recognized as the most dangerous and common type of cancer in humans.
ere are different types of skin cancer. Melanoma is a common type of skin cancer, where early detection can be helpful in its treatment and can suggestively stop death from this deadly skin cancer. Designing an approach that facilitates the skin cancer detection in the early stages is very useful and valuable. In this study, an optimized pipeline procedure was utilized for the optimal detection of melanoma from dermoscopy images. In the proposed method, after preprocessing of the input dermoscopy images based on noise reduction and contrast enhancement, the region of interest was segmented. Afterward, feature extraction was implemented on the segmented images to extract useful features from them. Finally, an optimized Deep Believe Network (DBN) was utilized to separate images into two classes of healthy and cancerous. e optimization of the DBN was using a new metaheuristic method called the developed ermal Exchange Optimization algorithm to improve the network efficiency in terms of reliability and accuracy. So, the main contribution of the proposed method is to use a newly developed version of the newly introduced ermal Exchange Optimization for the diagnosis of malignant melanoma. e main advantage of using this technique is that the results showed that it improved the system's efficiency in both terms of accuracy and precision or, in other words, its reliability during different runs. e performance of the proposed technique was authenticated by comparing it with 7 other methods: fractal analysis, CNN, Delaunay Triangulation, Side-by-Side method, Genetic Algorithm, fusion method, and SVM. Simulation results specified that, based on some various performance indexes, the suggested procedure gave the best results against the compared techniques. e main limitation of the proposed method is that, due to different soft techniques, it needs a lot of time.
In the future study, we will work on designing a method to simplify the method from a theoretical methodology to a real-time system for practical applications.