Sparse extreme learning machine classifier exploiting intrinsic graphs☆
Introduction
In the training of extreme learning machine (ELM)-based single hidden layer feedforward neural (SLFN) networks, the network hidden layer parameters are randomly assigned while the network output parameters are, subsequently, analytically calculated. Similar approaches have been also shown to be efficient in several neural network training methods [4], [5], [24], [26], [29], as well as in other learning processes [25]. Algorithms following this approach assume that the learning processes used to determine the hidden layer and the output weights need not be connected. In addition, it is assumed that the network hidden layer weights can be randomly assigned, therefore defining a random (nonlinear) mapping of the input space to a new (usually high-dimensional) feature space. The problem to be solved can be transformed to a linear problem in the new feature space by using a large number of (independent) hidden layer weights; thus, linear techniques such as mean square estimation can be used to determine the network’s output weights. The fact that the network’s hidden and output weights are determined independently has a number of advantages that can be exploited, for example, for facilitating the implementation of parallel/distributed systems. In addition, it has been shown to perform well in many classification problems.
In the original ELM algorithm [13], the trained network tends to reach not only the smallest training error but also the smallest output weight norm. For networks reaching small training errors, smaller output weight norms indicate better generalization performance [3]. Since its first proposal [13], several optimization schemes have been proposed to calculate the network output parameters, each highlighting different properties of the ELM networks [2], [7], [10], [11], [12], [14], [15], [17], [18], [22], [28]. Although the determination of the hidden layer network outputs is based on randomly assigned input parameters, it has been shown that SLFN networks trained by the ELM algorithm have the properties of global approximators [11], [19], [20]). In addition, it has been shown that ELM networks are able to outperform other sophisticated classification schemes, such as the support vector machine classifier [2], [9], [12].
Recently, an optimization scheme exploiting the hinge loss of the training error for calculating the network output weights has been proposed [2]. It exploits the fact that the network output weights can be expressed as a sparse representation of the training data representations in the feature space determined by the hidden network outputs. Thus, testing in both the original and kernel ELM formulations exploiting the hinge loss of the training error is faster than the calculation of the network output weights exploiting the squared loss of the training error. In order to speed up the training process of the so-called sparse ELM (S-ELM) networks, a sequential minimal optimization (SMO)-based optimization algorithm has also been proposed by Bai et al. [2]. By exploiting such an optimization approach, it has been shown that S-ELM is both effective and efficient. Experimental results show that it is able to outperform ELM formulations by exploiting the squared loss of the training error, while its training and test computational costs are lower than those of ELMs and SVMs [2].
In this paper, we describe an optimization scheme for S-ELM-based SLFN network training, which exploits intrinsic graph structures expressing class geometric relationships of the training data in the feature space determined by the network hidden layer outputs, noted as ELM space hereafter. This optimization scheme is also extended to exploiting intrinsic graph structures that express class geometric relationships of the training data in arbitrary-dimensional ELM spaces used in kernel ELM formulations [2], [16]. Intrinsic graphs have also been exploited in SVM-based classification [1], [23] and ELM networks using the squared loss of the training error [14], [15]. Here, the use of such an approach for S-ELM network training is also shown. It is shown that S-ELM networks trained by applying the adopted optimization scheme achieve better classification performance compared with S-ELM networks trained by applying the original optimization scheme, as described by Bai et al. [2]. In addition, in order to exploit fast optimization algorithms like those proposed by Bai et al. [2] and Sha et al. [27], the application of the adopted optimization scheme on the original (kernel) ELM space is shown to be equivalent to the application of the original S-ELM optimization scheme to a transformed (kernel) ELM space.
The rest of the paper is structured as follows: in Section 2, an overview of the S-ELM algorithm is provided. In Section 3, an optimization scheme is described for S-ELM-based network training, which exploits geometric data information described in intrinsic graphs. Experiments comparing the performance of S-ELM with that of our optimization scheme are described in Section 5. Finally, conclusions are drawn in Section 6.
Section snippets
Overview of S-ELM networks
Let us denote by a set of N vectors and the corresponding class labels . We would like to employ in order to train a SLFN network using the S-ELM algorithm [2]. Such a network consists of D input (equal to the dimensionality of xi), L hidden, and C output (equal to the number of classes involved in the classification problem) neurons. The number of hidden layer neurons L is a parameter of the S-ELM algorithm, and it is usually set to be much greater
S-ELM exploiting intrinsic graphs
In this section, an optimization scheme for SLFN network training that exploits intrinsic graph structures is described. Similar to S-ELM, we perform a one-versus-rest classification. For the two-class problem discriminating class k from the remaining ones, the following optimization problem is solved for the calculation of the network output weight vector wk: where is a matrix expressing geometric relationships of the
Discussion
By observing (6), (26), and (10), (31), it can be seen that the optimization problems solved by the two approaches in both the original and kernel formulations are similar. In this section, the application of the optimization scheme described in Section 3 in the original ELM space is shown to be equivalent to the application of the S-ELM algorithm [2] in a transformed ELM space, for both the original and kernel S-ELM formulations.
For the optimization scheme exploiting random hidden layer
Experiments
In this section, we present experiments conducted to compare the performance of our method with that of S-ELM. Twelve publicly available datasets were used from the machine learning repository of the University of California Irvine (UCI) [8] to this end. Table 1 provides information concerning the datasets used in our experiments. The datasets were normalized so as to have zero mean and unit standard deviation.
As there is no widely adopted experimental protocol for these datasets, the fivefold
Conclusions
In this paper, we described an optimization scheme to calculate the output weights of a SLFN network using the S-ELM classification scheme. This optimization scheme exploits data relationships in the ELM space in order to incorporate geometric information in the derived decision function. By following this approach, it is expected that better generalization performance can be achieved, when compared with the solutions obtained by applying the S-ELM algorithm. We have experimentally evaluated
Acknowledgment
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement number 316564 (IMPART).
References (30)
- et al.
Regularized extreme learning machine for multi-view semi-supervised action recognition
Neurocomputing
(2014) - et al.
On the kernel extreme learning machine classifier
Pattern Recognit. Lett.
(2015) - et al.
Fully complex extreme learning machine
Neurocomputing
(2005) - et al.
Learning and generalization characteristics of random vector functional-link net
Neurocomputing
(1994) - et al.
A study on effectiveness of extreme learning machine
Neurocomputing
(2011) - et al.
The no-prop algorithm: a new learning algorithm for multilayer neural networks
Neural Netw.
(2013) - et al.
Exploiting graph embedding in support vector machines
IEEE International Workshop on Machine Learning for Signal Processing
(2012) - et al.
Sparse extreme learning machine for classification
IEEE Trans. Cybern.
(2014) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
IEEE Trans. Inform. Theory
(1998)- et al.
Multivariable functional interpolation and adaptive networks
Complex Syst.
(1988)
A rapid supervised learning neural network for function interpolation and approximation
IEEE Trans. Neural Netw.
Pattern Recognition: A Statistical Approach
Error minimized extreme learning machine with growth of hidden nodes and incremental learning
IEEE Trans. Neural Netw.
An insight into extreme learning machines: random neurons, random features and kernels
Cogn. Comput.
Cited by (5)
Sparse pseudoinverse incremental extreme learning machine
2018, NeurocomputingCitation Excerpt :Extreme learning machine (ELM) is a special type of single hidden layer feedforward network that provides an analytic solution for learning [1–5].
Laplacian total margin support vector machine based on within-class scatter
2017, Knowledge-Based SystemsCitation Excerpt :By adding the manifold regularization term to the traditional SVM, Belkin et al. proposed the laplacian support vector machine (LapSVM) [5] which took the underlying geometric information into full consideration to build a more reasonable classifier. As shown in previous researches [19–27], the geometric information of instances is an important priori knowledge for classifiers. By incorporating the within-class scatter matrix in the traditional SVMs, a lot of effective SVM-based algorithms [19–21] have been proposed.
An Incremental Type-2 Meta-Cognitive Extreme Learning Machine
2017, IEEE Transactions on Cybernetics
- ☆
This paper has been recommended for acceptance by Y. Liu.