Enhancement of ELM by Clustering Discrimination Manifold Regularization and Multiobjective FOA for Semisupervised Classification

A novel semisupervised extreme learning machine (ELM) with clustering discrimination manifold regularization (CDMR) framework named CDMR-ELM is proposed for semisupervised classification. By using unsupervised fuzzy clustering method, CDMR framework integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization. Aiming at further improving the classification accuracy and efficiency, a new multiobjective fruit fly optimization algorithm (MOFOA) is developed to optimize crucial parameters of CDME-ELM. The proposed MOFOA is implemented with two objectives: simultaneously minimizing the number of hidden nodes and mean square error (MSE). The results of experiments on actual datasets show that the proposed semisupervised classifier can obtain better accuracy and efficiency with relatively few hidden nodes compared with other state-of-the-art classifiers.


Introduction
Recently, ELM [1,2] shows better performance than traditional gradient-based learning methods and support vector machine (SVM) [3,4] in regression and classification applications due to its faster learning capacity. As a supervised learning algorithm, the applicability of ELM is seriously restrained [5]. In actual applications, unlabeled data are easy to obtain while the acquisition of labeled data is time consuming and hard. Based on this, it is imperative to extend ELM to achieve semisupervised classification.
Manifold regularization is a frequently used semisupervised learning method based on smoothness assumption [6]. LapRLS [7] and LapSVM [8,9] based on manifold assumption are frequently used semisupervised learning algorithm. However, manifold regularization is prone to sinking into misclassification in boundary area between several clusters because boundary instances in manifold structure are likely to belong to different classes [10]. Wu et al. [11] proposed semisupervised discrimination regularization (SSDR) for solving misclassification by utilizing discrimination of labeled data in learning. However, due to the scarcity of labeled data, the improvement of misclassification is limited. Wang et al. [12] proposed discrimination-aware manifold regularization (DAMR) in which discrimination of the whole data is considered to improve accuracy. Yet, DAMR merely adopted binary cluster labels which are insufficient for multiclass problem. In view of this, an improved MR framework named clustering discrimination manifold regularization (CDMR) which integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization is proposed, and a semisupervised ELM with CDMR framework termed CDMR-ELM is finally developed. The proposed novel framework can effectively avoid boundary misclassification which frequently occurred in manifold regularization and improve the classification accuracy by combining the clustering discrimination with twinning constraints regularization containing lower intracluster compactness and higher intercluster separability.
FOA is a global optimization searching method based on food finding behavior of fruit fly with the advantages of simplicity and being easy to understand [13,14]. This 2 Computational Intelligence and Neuroscience paper develops an improved variant of FOA named multiobjective fruit fly optimization algorithm (MOFOA) to optimize crucial parameters of CDMR-ELM consisting of the number of hidden nodes and trade-off parameters for further improving the classification accuracy and efficiency. The MOFOA employs MSE to evaluate fitness function and adopts adaptively reduced search area for decision variable to alleviate the possibility of sinking into local extremum and prematurity [15,16]. Above all, unlike traditional FOA-ELM which implements optimization iteration with fixed number of hidden nodes, MOFOA is based on two objectives: simultaneously minimizing the number of hidden nodes and MSE which can obtain a set of optimal parameters to increase classification accuracy with fewer hidden nodes to reduce computational complexity and enhance efficiency.
The rest of this paper is organized as follows: Section 2 introduces related basic theory. Section 3 proposes a novel CDMR framework and integrates it with ELM. Section 4 presents MOFOA to optimize the parameters of CDMR-ELM. Experimental setup and comparison results are given in Section 5. Section 6 is the conclusion of this paper. , where and represent the dimensions of input and output vector. The output of ELM with respect to sample is determined as follows [17]:

Related Basic Theory
where is the number of hidden nodes, (⋅) is hidden layer output function, and is output weight connecting the th hidden node to output layer. Input weight and bias of hidden nodes are randomly assigned in advance. Equation (1) can be converted into a compact form as follows: where is output matrix of hidden layer and, by minimizing the square loss of predicted error and norm of weight, ELM analyzes the optimal output weight as follows:

Manifold Regularization Framework.
Manifold regularization framework is built on manifold assumption that close points in the intrinsic geometry of marginal distribution should share similar labels and can effectively solve problem of training dataset consisting of both labeled and unlabeled data [18]. Labeled data {( , )} =1,..., are generated according to probability distribution and unlabeled data { } =1,..., are drawn according to of . By minimizing the following cost function, manifold regularization framework can obtain an optimal classification function (⋅): where (⋅) represents loss function and regularization term ‖ ‖ 2 represents the complexity of classifier and regularization term ‖ ‖ 2 which represents smoothness of sample distribution and it can be approximated as where 1/( + ) 2 is normalization coefficient for the empirical estimate, = − is Laplacian matrix of the whole data, is the weight matrix in which each element represents the similarity weight between ( ) and ( ), and is a diagonal matrix in which = ∑ + =1 .

Fruit Fly Optimization Algorithm (FOA)
. The steps of FOA are shown as follows.
Step 1. Randomly initialize the location of fruit fly: axis, axis.
Step 2. Randomly generate the distance and direction for searching food by using osphresis of an individual: = axis + RandomValue, = axis + RandomValue.
Step 3. Estimate the distance between each individual and origin and set the reciprocal of as smell concentration judgment value : Computational Intelligence and Neuroscience 3 Step 4. Substitute ( ) into smell concentration judgment function or fitness function of optimization to calculate the smell concentration Smell( ) of individual fruit fly: Smell( ) = ( ( )).
Step 5. Find out the individual fruit fly with maximal smell concentration: [bestSmell bestindex] = max(Smell) in which bestindex is the location of best individual.
Step 7. Repeat Step 2 to Step 5 to execute iterative optimization until termination arrived and judge whether the smell concentration is better than previous one, if so, execute Step 5.

The Proposed Classifier: CDMR-ELM
3.1. CDMR Framework. In this paper, we consider a multiclass dataset with labeled data {( , )} =1,..., and unlabeled data { } = +1,..., + . Firstly, with the purpose of obtaining the clustering discrimination of the whole data, utilize unsupervised fuzzy clustering method [19] to divide the whole dataset into fuzzy clusters which can effectively reflect the underlying cluster structure. Preserve all cluster labels to form a cluster vector with dimension of ( + ) expressed as = [ 1 , . . . , + ], where which is between 1 and represents the fuzzy clustering label of the th data. In order to fully consider the reliability of clustering result during the learning process, define a membership vector = [ 1 , . . . , + ] in which the element represents the memberships degradation of the th data defined as follows: where is inversely proportional to distance between point and center of corresponding fuzzy cluster. Then, set = to describe the reliability of clustering. The clustering discrimination matrix is defined on the basis of clustering labels and clustering reliability matrix . The element of clustering discrimination matrix represents whether th instance and th instance belong to the same fuzzy clustering and is defined as follows: where , = 1, . . . , ( + ) and ∈ ( + )×( + ) . For labeled data, reserve their class labels to form labeled discrimination matrix as follows: where , = 1, . . . , . Final discrimination matrix ∈ ( + )×( + ) is built by combining clustering discrimination matrix and labeled discrimination matrix together: where , = 1, . . . , ( + ). In summary, is 1 in two situations: firstly when th instance and th instance belong to the same class for labeled data or the same clustering for unlabeled data and secondly when the reliability of clustering is low. Further, the optimal solution of classification should possess twinning constraints regularization containing lower intracluster compactness and higher intercluster separability as follows: where is weight matrix for intracluster in which , is 1 when is 1 and , is 0 when is −1.
is weight matrix for intercluster in which , is 1 when is −1 and , is 0 when is 1.
Finally, the proposed framework utilizes cluster assumption; that is, data in the same cluster with high similarity weighted by clustering reliability should share the same class label or otherwise possess different class labels. By integrating clustering discrimination of labeled and unlabeled data with twinning constraints regularization described as (13), formulate optimization problem of the proposed CDMR framework as follows: where is the weight matrix of the whole data and 0 represents the similarity between instance and instance according to the distance between them in fuzzy clustering manifold structure and ∈ ( + )×( + ) is the final discrimination matrix which integrates fuzzy clustering discrimination with labeled discrimination. In (14), the regularization 4 Computational Intelligence and Neuroscience ( ( ) − ( )) 2 represents the fuzzy clustering discrimination of both labeled data and unlabeled data.

CDMR Framework Based ELM (CDMR-ELM).
Based on ELM and the proposed CDMR framework, we construct semisupervised classification model on the basis of CDMR-ELM. Substitute (1) into (14) to obtain the objective function as follows: where , . By zeroing the gradient of the objective function with respect to , convert (16) as follows: Then, the solution of the CDMR-ELM is obtained: where is the identity matrix with dimension of + . According to (1) and (2), the decision function of the proposed semisupervised classification model with regard to input is shown as follows: Given the hidden weights , biases , and trade-off parameters , , and previously, the MSE of classification described as follows should be minimized to improve accuracy: where represents the predicted output and ( ) represents the actual output for input data .

The Multiobjective Optimization Problem and Solutions.
Considering that the number of hidden layer nodes strongly influences the semisupervised classification efficiency and training time, the multi-objective optimization problem is to find the optimal SLFNs with a lower MSE and a smaller number of hidden nodes simultaneously as follows: min (MSE ( , , , , ) , ) , The solutions of this multiobjective optimization problem are represented as follows: Parameters , , and can control the reliability of the clustering discrimination from the semisupervised clustering method. If the values of these parameters are larger, the fuzzy clustering discrimination is more important. Otherwise, if the values of these parameters are small, CDMR will degenerate to smoothness assumptions as manifold regularization [12]. Therefore, the values of parameters , , and should be optimized with the aim of achieving better classification accuracy.
Unlike traditional single-objective optimization problem, optimization problem with multiobjective is impossible to find single solution which simultaneously minimizes all objectives [20][21][22]. This paper looks for a set of optimal solutions where there is no other efficient solution which improves one element of objectives without deteriorating the remaining elements.

MOFOA-CDMR-ELM Classifier.
Considering FOA has possibility of sinking into local extremum and prematurity [15], this paper improves traditional FOA in the following two aspects: (1) Employing MSE to evaluate fitness function as follows: Computational Intelligence and Neuroscience

5
(1) Build the initial fruit fly swarm 1 in which each individual is in the form of (22).
(2) Evaluate the fitness function on training set by using (20).
(3) Set and variables. for k = 1 to do (4) According to and , adjust of each individual in . (5) Evaluate the new swarm on training set by using (20). (6) for i = 1 to do (7) for each individual in the (8) Adjust ( , , , , ) by using adaptively reduced search area by using (24)  (9) if MSE of new individual is better than previous one then (10) New individual replaces previous one (11) Reset and variables (12) Reserve global optimal solutions * in population .
where is the number of iterations, , ( ) represents the adaptive search area for the th iteration, ∈ [1, ] is current iteration index, and max is the maximum search area set as 1/2 which is quarter of gap between high limit and low limit of and . The algorithm starts with initializing fruit fly swarm 1 consisting of size of swarm individuals represented as vector in (22) in which ( , ) are randomly assigned from uniform distribution between −1 and 1, ( , , ) are limited in the range of (2 −24 , 2 24 ), and L is between 1 and the upper limit for hidden nodes. Next, introduce two variables add and reduce to control the search of optimal by means of relationship between and MSE. After adjustment of appropriate , evaluate the new solutions and implement max it times inner loop on them for adjusting parameters ( , , , , and ) in which three trade-off parameters are tuning during the range. Finally, reset the value of add and reduce. The main loop is repeated times to search global optimal swarm * in .
The MOFOA for optimizing CDMR-ELM is described as shown in Algorithm 1.
In this paper, we suppose relationship between the MSE and is parabolic or linear. If is proportional to MSE, set add to be 0 and set reduce to be 1. If is inversely proportional to MSE, set add to be 1 and set reduce to be 0. If MSE does not improve by increasing or decreasing nodes, set both add and reduce to be 1. If MSE decreases when both increases and decreases, set both add and reduce to be 0. Variables add and reduce guide the search of as shown in Algorithm 2.
In Algorithm 2 max , min , and mid are maximum, minimum, and middle values of in population , is uniform random value in (0, 1), and is the upper limit for hidden nodes.

Datasets and Experiment Setup.
In order to evaluate the accuracy and efficiency of the proposed MOFOA-CDMR-ELM classifier, we perform a set of experiments on several real-world datasets from the UCI machine learning repository and benchmark repository frequently used for semisupervised learning [23]. The details of datasets are shown in Table 1.
Comparison experiments are implemented on two types of classifiers: one type is supervised classifier including SVM and ELM; the other type is semisupervised classifier including SSL-ELM [5,24], LapRLS [7], LapSVM [8], and  Number of individuals in swarm 20 Number of outer iterations 100 Number of inner iterations 100 Maximal number of hidden nodes the proposed classifier. Divide each dataset into three subsets: testing set, validation set, and training set which is further partitioned into fixed labeled set and unlabeled set. Make sure the labeled set contains at least one sample of each class. Training set is used to train classifiers. Validation set containing labeled data is utilized for optimal model selection.
Testing set is used to verify the classifier performance and efficiency. All experiments are implemented in MATLAB 7.0 which is running on a PC with CPU of 3.4 GHZ and RAM of 4.0 GB.  Table 2.

Effectiveness of the Proposed MOFOA.
In order to evaluate the effectiveness of the proposed optimization method in searching for optimal parameters as (23), we compare three classifiers containing the proposed CDMR-ELM, FOA-CDMR-ELM, and MOFOA-CDMR-ELM. Classifier CDMR-ELM with random ( , ) can obtain optimal trade-off parameters , , and by implementing 10-fold cross validation on validation set for 100 times. FOA-CDMR-ELM employs classification error rate to guide the search of optimal weights and biases ( , ) as well as weight of regularization items , , and by giving a fixed number of hidden nodes . MOFOA-CDMR-ELM searches appropriate number of hidden nodes, weights, and biases ( , ) as well as weight of regularization items , , and by simultaneously minimizing MSE and the number of hidden nodes . Table 3 shows the mean value of classification accuracy and number of hidden nodes by three classifiers on all datasets. Data in bold type represent the optimal classification result and hidden nodes.
From Table 3, we can see that the proposed MOFOA-CDMR-ELM classifier is better than the other two competitive classifiers in 75%. This result fully verifies the effectiveness of the proposed optimization method based on FOA since it adopts adaptively reduced search area for searching in iteration to reduce possibility of sinking into local extremum and premature. Further, focusing on minimizing both MSE and hidden nodes, MOFOA can obtain superior networks with less hidden nodes under the guarantee of better accuracy.

Comparison of Performance.
We compare classification accuracy between some state-of-the-art supervised classifiers and semisupervised classifiers on above-mentioned datasets to evaluate efficiency and effectiveness of the proposed classifier. Table 4 shows the mean value and standard deviation of classification accuracy and Table 5 shows the mean value and standard deviation of running time of all the compared classifiers on 8 datasets.
From Table 4, we can conclude the following: (1) LapRLS and LapSVM outperform supervised classifiers SVM and ELM in semisupervised learning even with a few labeled data, since LapRLS and LapSVM adopt manifold regularization to utilize unlabeled data according to nonlinear geometrical manifold structure embedding in the whole data.
(2) Among three existing semisupervised classifiers, by constructing a framework that integrates manifold assumption with constraints between all the labeled data to relieve misclassification in boundary area and enhance the smoothness of decision function, SSL-ELM obtains better classification accuracy than LapRLS and LapSVM.
(3) The proposed classifier outperforms SSL-ELM especially on multiclass datasets since it adopts unsupervised fuzzy clustering method and considers inner cluster and intercluster constraints not only between labeled data but also between unlabeled data. Further, the proposed MOFOA plays an important role in enhancing the performance by searching for optimal parameters.
From Table 5, we can see that training time of SVM and ELM is obviously less than semisupervised classifier especially on dataset with large size since they are trained only based on labeled data. To be fair, comparing four semisupervised classifiers trained based on both labeled and unlabeled data, training time of LapSVM classifier for multiclass dataset is more than others. It is possibly due to the fact that one-to-rest method seriously increases running Computational Intelligence and Neuroscience 7   time in iterative process. The proposed classifier optimized by MOFOA obtains optimal parameters in model with high classification and fewer hidden nodes which lead to fast learning speed according to the theory of ELM that the number of hidden nodes is proportional to training time. In general, the proposed classifier can achieve better performance with optimal learning speed.

Performance with Different Number of Labeled and
Unlabeled Data. The previous experiments are implemented under fixed labeled set and unlabeled set. If the number of labeled and unlabeled data varies gradually, the performance of classifiers exhibits some change tendency. Figure 1 shows the performance variation of ELM, LapRLS, LapSVM, SSL-ELM, and the proposed classifier on two representative datasets, Shuttle and Seeds, with different number of labeled data by varying proportion of labeled data and unlabeled data in training set. Figure 2 shows the performance variation of these classifiers with different number of unlabeled data. From Figure 1, we can observe that, with the increase of number of labeled data, the classification accuracy of every classifier is stably improved. Further, accuracy of the proposed classifier outperforms others all along. From Figure 2, we can see that, with the increase of number of unlabeled data, the classification accuracy of ELM is maintained unchanged since it works only based on labeled data while accuracy of the other semisupervised classifier is enhanced obviously. Further, even with very few unlabeled data, the proposed classifier outperforms SSL-ELM, LapRLS, and LapSVM because it constructs manifold structure by fully utilizing both unlabeled data and labeled data which is effective for supervised learning. In general, the results verify that the proposed classifier can obtain better performance in dynamic semisupervised classification since it integrates discrimination of both labeled and unlabeled data with twinning constraints of fuzzy clusters.

Conclusion
In this paper, we propose a feasible semisupervised learning method in terms of clustering discrimination of the whole data and twinning constraints regularization named CDMR. Further, we integrate ELM with the proposed semisupervised learning framework to achieve semisupervised classification.
With the purpose of enhancing the classification accuracy and training speed of the proposed classifier, we build a novel multiobjective FOA which simultaneously minimizes the number of hidden nodes and MSE to obtain optimal parameters of classifier to guarantee that there are no other SLFNs with higher accuracy and fewer or equal number of hidden nodes. Experiments' results on several datasets confirm the effectiveness and efficiency of the proposed MOFOA-CDMR-ELM classifier. In the future, we will deeply study the sparsity problem of matrix multiplication to further reduce training time.