Functional extreme learning machine for regression and classification

: Although Extreme Learning Machine (ELM) can learn thousands of times faster than traditional slow gradient algorithms for training neural networks, ELM fitting accuracy is limited. This paper develops Functional Extreme Learning Machine (FELM), which is a novel regression and classifier. It takes functional neurons as the basic computing units and uses functional equation-solving theory to guide the modeling process of functional extreme learning machines. The functional neuron function of FELM is not fixed, and its learning process refers to the process of estimating or adjusting the coefficients. It follows the spirit of extreme learning and solves the generalized inverse of the hidden layer neuron output matrix through the principle of minimum error, without iterating to obtain the optimal hidden layer coefficients. To verify the performance of the proposed FELM, it is compared with ELM, OP-ELM, SVM and LSSVM on several synthetic datasets, XOR problem, benchmark regression and classification datasets. The experimental results show that although the proposed FELM has the same learning speed as ELM, its generalization performance and stability are better than ELM.


Introduction
Artificial Neural Networks (ANNs) simulate the process of human brain information processing through a large number of neurons that are interconnected in a certain way and efficient network learning algorithms [1]. Over the past few decades, ANNs have been widely used in various fields of human needs due to their powerful nonlinear mapping capabilities and parallel computing capabilities [2].
So far, many neural network learning algorithms have been proposed and improved. Jian et al. summarized well-known learning algorithms among them [3]. Among them, the backpropagation algorithm is one of the most mature neural network learning algorithms, and it is also a famous representative of all iterative gradient descent algorithms for supervised learning in neural networks.
incremental method [41][42][43][44][45], pruning [46][47][48][49][50] and adaptive [51][52][53]. With the advent of the era of big data, storing and processing large-scale data has become an urgent need for enterprises, and the ensemble and parallelism of ELM have therefore become a research hotspot [54][55][56][57]. Lam and Wunsch learn features through unsupervised feature learning (UFL) algorithm, and then train features through fast radial basis function (RBF) extreme learning machine (ELM), which improves the accuracy and speed of the algorithm [58]. Yao and Ge proposed distributed parallel extreme learning machine (dp-ELM) and hierarchical extreme learning machine [59]. Duan et al. proposed an efficient ELM with three parallel sub-algorithms based on the Spark framework (SELM) for big data classification. [60]. Many researchers have turned their attention to deep ELM and conducted some innovative research works [61,62]. Dai et al. proposed multilayer one-class extreme learning machine (OC-ELM) [63]. Zhang et al. proposed multi-layer extreme learning machine (ML-ELM) [64]. Yahia et al. proposed a new structure based on extreme learning machine auto-encoder with deep learning structure and a composite wavelet activation function for hidden nodes [65].
The inappropriate initial parameters of the hidden layer (input weights, hidden layer biases and the number of nodes) in the original ELM will lead to poor classification results of ELM [21], although the improved algorithms mentioned above for the original ELM improve its generalization performance, they greatly increase the computational complexity. Therefore, we need a network learning algorithm with fast learning speed and higher generalization performance.
In this paper, we propose a new regression and classification model without iterative optimization parameters in the spirit of extreme learning, called functional extreme learning machine (FELM). FELM aims to use functional neurons (FNs) model as the basic units, and use functional equationsolving theory to guide the modeling process of functional extreme learning machine [66][67][68][69]. Like ELM, the FELM parameter matrix is obtained by solving the generalized inverse of the hidden layer neuron output matrix. FELM is a generalization of ELM. Its unique network structure and simple and efficient learning algorithm make it not only solve the problems that ELM can solve, but also solve many problems that ELM cannot solve. However, FELM is also different from ELM. The activation function of ELM is fixed, and ELM has weights and biases. The neuron function of FELM is not fixed, and there are no weights and biases, only parameters (coefficients), so it avoids the influence of random parameters (input weights, hidden layer biases) on the generalization performance and stability of ELM model. Its neuron functions are linear combinations of given basic functions, and the basic functions are selected according to the problem to be solved without specifying the number. The learning essence of FELM is the learning of parameters, and the parameter learning algorithm proposed in this paper does not need iterative calculation and has high accuracy. FELM is compared with other popular technologies in terms of generalization performance and training time on several artificial datasets, benchmark regression and classification datasets. The results show that FELM is not only fast, but also has good generalization performance.
The rest of this paper is organized as follows: Section 2 provides an overview of FN and ELM. In Section 3, the topology of functional extreme learning machine, the theory of structural simplification, and the parameter learning algorithm are described. Section 4 presents the performance comparison results of FELM, classical ELM, OP-ELM, classical SVM and LSSVM on regression and classification problems. Section 5 draws conclusions and discusses future research directions.

Preliminaries
Functional neuron model and extreme learning machine (ELM) will be briefly discussed in the following section.

Functional neuron model
Functional neuron was proposed by Enrique Castillo [66]. Figure 1(a) is a functional neuron model, Figure 1(b) is its expansion model and Figure 1(c) is the expanded model for the green part of Figure 1(b). The mathematical expression of the functional neuron is: where, = { , , . . . , }, = { , , . . . , }, (⋅) is functional neuron function, and are the input and output of functional neuron function, respectively. Functional neuron function can be expressed by a linear combination of basic functions: where, { ( ) = ( ( ), ( ), . . . , ( ))| = 1,2, . . . , } is any given basic function family, and different function families can be selected according to specific problems and data.

Extreme learning machine (ELM)
Based on the generalized inverse matrix theory, Huang et al. proposed a new type of single hidden layer feedforward neural network algorithm with excellent performance-extreme learning machine (ELM) [11]. The extreme learning machine network structure is shown in Figure 2. as the corresponding expected output. ( ) is an activation function, which is a nonlinear piecewise continuous function that satisfies the ELM general approximation ability theorem. The commonly used functions are Sigmoid function, Gaussian function, etc. So the mathematical model in Figure 2 is expressed as follows: In ELM, is called a random feature mapping matrix, = [ , , . . . , ] represents the input weight that connects the th hidden layer neuron and the input layer neuron, represents the bias of the th hidden layer neuron, and = [ , … , ] represents the weight matrix between the output layer and the hidden layer. Hidden layer node parameters ( , ) are randomly generated and remain unchanged.
Calculate the output weight: where, represents the Moore-Penrose generalized inverse of the hidden layer output matrix .

Figure 2.
Extreme learning machine network model.

Functional extreme learning machine topology
According to the example of functional extreme learning machine in Figure 3 2) One or more layers of processing units (i.e., functional neurons): Each functional neuron is a computing unit, which processes the input values from input units or the previous layer of functional neurons, and provides input data to the next layer of neurons or output units. As in Figure 3(a) { , , }.
3) A set of directed links: They connect the input units to the first layer of processing units, one layer of processing units to the next layer of processing units, and the last layer of computing units to the output units. The arrows indicate the direction in which information flows. Information flows only from the input layer to the output layer.
All these elements together constitute the network architecture of the functional extreme learning machine (FELM). The network architecture corresponds to the functional equation one by one. The functional equation is the key to the FELM learning process. Therefore, the network structure is determined, and the generalization ability of the FELM is also defined. Note the following differences between standard neural networks and FELM networks: 1) The functional neuron as shown in Figure 1

Structural simplification and unique expression of FELM
Structural simplification: Each initial network structure corresponds to a functional equation set, then the functional equation set solution method is used to simplify the initial structure to obtain an equivalent FELM which is optimal. The functional equation corresponding to Figure 3(a) is: where, , , , , and are arbitrary continuous and strictly monotonic functions. Therefore, according to Theorem 1 of [66], the general solution of functional Eq (5) is Eq (7): According to Eq (7), Eq (5) can be written as: According to Eq (8), the corresponding topology structure can be drawn, as shown in Figure 3(b). Figure 3(b),(a) are equivalent, indicating that they get the same output when they have the same input. In the initial structure of FELM, the functional neuron function is multi-parameter. In the simplified FELM structure, the functional neuron function is a single parameter.
Expression uniqueness of FELM: After structural simplification, the functional equation corresponding to the simplified functional network is = [ ( ) + ( ) + ( )], but whether the expression of the functional equation is unique needs to be verified. The following is the verification process, assuming that there are two functional neuron function sets { , , , } and { , , , }, such that :  (10) where , , , are arbitrary constants. Because any values ( , , , ), Eq (10) into Eq (7), will get the following result: Therefore, the expression of Eq (8) is unique.

Functional extreme learning machine learning algorithm
The FELM in Figure 3(b) is taken as an example to illustrate its parameter learning process. Write Eq (8) as follows where represents . Each neuron function is a linear combination of given nonlinear correlation basic functions, that is  (14) is obtained: The parameters of FELM can be obtained by Eq (15).
where is the generalized inverse of . The above example illustrates the process of model learning. The steps of constructing and simplifying the FELM network and then performing parameter learning are as follows: Step 1: Based on the characteristics of the problem to be solved, the initial network model is established; Step 2: Write the functional equation corresponding to the initial network model; Step 3: Using the functional equation solving method to solve the functional equation and obtain the general solution expression; Step 4: Based on the general solution expression, use its one-to-one correspondence with the FELM to redraw the corresponding FELM network (simplified FELM); Step 5: The FELM learning algorithm is used to obtain the optimal parameters of the model.

Performance evaluation
In this section, on many benchmark practical problems in the field of function approximation and classification, the performance of the proposed FELM learning algorithm is compared with the commonly used network algorithms (ELM, OP-ELM, SVM, LS-SVM) on two artificial datasets, 20 different datasets (16 for regression, 4 for classification) and XOR classification problem to verify the effectiveness and superiority of FELM. Experimental environment description for FELM and comparison algorithms: 11th Gen Intel (R) Core (TM) i5-11320H @ 3.20 GHz, 16 GB RAM and MATLAB 2019b. ELM source code used in all experiments: http://www.ntu.edu.sg/home/egbhuang/, OP-ELM source code: https://research.cs.aalto.fi//aml/software.shtml, SVM source code: http://www.csie.ntu.edu.tw/cjlin/libsvm/, and the most popular LS-SVM implementation: http://www.esat.kuleuven.ac.be/sista/lssvmlab/. The sigmoidal activation function is used for ELM, the Gaussian kernel function is used for OP-ELM, and the radial basis function is used for SVM and LS-SVM. The basic functions of our proposed algorithm FELM will be set according to the following specific problems to be solved. In Section 3.1 and Section 3.2, FELM adopts the network structure of Figure 3 It is well known that the performance of SVM is sensitive to the combination of ( , ). Similar to SVM, the generalization performance of LS-SVM is also closely dependent on the combination of ( , ) . Therefore, in order to achieve good generalization performance, it is necessary to select appropriate cost parameter and kernel parameter for SVM and LS-SVM in each dataset. We tried 17 different values of and , that is, for each dataset, we used 17 different values and 17 different values, a total of 289 pairs ( , ). Each problem is tested 50 times, and the training data set and the test data set are randomly generated from the entire dataset. For each dataset, two-thirds is the training set and the rest is the test set. This section gives the simulation results, including average training and test accuracy, corresponding standard deviation (Dev), and training time. In experiments, all inputs (attributes) and outputs (targets) have been normalized into [-1, 1].

Artificial datasets
To test the performance of FELM on regression problems, we first use the objective function 'sinc' function, which is defined as: To effectively reflect the performance of our algorithm, some different forms of zero mean Gaussian noise pollution are added to the training data points. In particular, we have the following training samples ( , ), = 1,2, . . . , .
( Next, we proceed to compare the performance of the proposed FELM with other algorithms using the following two synthetic datasets.

Datasets
Basic functions Types A and B {1, , , , , ,   As shown in Table 1, appropriate basic functions are assigned to our FELM algorithm on four different synthetic datasets. The initial node number of ELM algorithm is 5, and the optimal number of hidden layer nodes is found by interval 5 nodes in 5-100, and the initial maximum number of neurons for OP-ELM is 100. The results of 50 experiments on all algorithms are shown in Table 2, where bold indicates optimal test accuracy. Figure 5 plots the one-time fit curves of FELM and other regressors on these synthetic datasets with different noise types. It can be seen from Table 2 that the proposed FELM learning algorithm achieves the highest test accuracy (root mean square error, RMS) on artificial datasets with noise types A, B and D. On the artificial dataset with noise type C, FELM is superior to ELM, OP-ELM and LSSVR, and is only lower than SVR. Table 2 also shows the optimal parameter combinations of SVR and LSSVR on these synthetic datasets and the required support vectors (SVs), the network complexity (nodes) of FELM, ELM and OP-ELM. In addition, Table 2 also compares the training and testing time of these five methods. It can be seen that the proposed FELM is the fastest learning method, which is several times or dozens of times faster than ELM, and hundreds of times faster than OP-ELM, SVR and LSSVR. This is because compared with ELM, SVR and LSSVR, FELM has the smallest network complexity, so it requires less learning time. Compared with OP-ELM, FELM does not need to cut out redundant nodes, so it requires less training time. Since the number of support vectors required for SVR and the support vectors required for LSSVR are much larger than the network complexity of FELM, they all take more test time than FELM, at least 60 times more than it does, which means that FELM trained in actual deployment may respond to new external unknown data much faster than SVM and LS-SVM. In short, the proposed FELM outperforms the other four comparison algorithms in approaching four artificial datasets with different types of noise.

Realistic regression problems
For further evaluation, 16 different regression datasets are selected. These datasets are usually used to test machine learning algorithms, mainly from UCI Machine Learning repository [70] and StatLib [71]. The different attributes of 16 datasets are summarized in Table 3.
As shown in Table 4, we assign appropriate basic functions to our FELM algorithm on 16 different datasets. It can be seen that these basic functions are relatively short in length, indicating that the structural complexity of the networks is low. The initial number of nodes in ELM is 5, and the optimal number of hidden layer nodes is found at intervals of 5 nodes within 5-100. The optimal number of nodes obtained by ELM on each dataset is shown in Table 5. The table also shows the best parameter combination and support vector number of SVR and LSSVR on each dataset, the initial maximum number of neurons and the number of neurons after pruning of OP-ELM.
The results of 50 trials on 16 datasets by the proposed FELM and other comparison algorithms are shown in Tables 6-8. The bold body in Table 6 indicates the optimal test accuracy. The comparison of FELM and the other four comparison algorithms on testing RMSE is shown in Table 6. The minimum test RMSE is obtained on 10 datasets of Autoprice, Balloon, Baskball, Cleveland, Cloud, Diabetes, Machine CPU, Servo, Strike and Wisconsin B.C. On other datasets, although the accuracy obtained by our algorithm is lower than SVR and LSSVR, it is higher than ELM and OP-ELM. The comparison of the five algorithms in training and testing time is shown in Table 7. The table shows that our FELM spends similar training and testing time as ELM, but much less than OP-ELM, SVR and LSSVR. The comparison results of FELM and other algorithms on the standard deviation of testing RMSE are shown in Table 8. According to the table, our FELM is a stable learning method. Figure 6 shows the test RMSE comparison of FELM and other comparison algorithms running 50 times on four datasets (Autoprice, Cleveland, Abalone and Quake). In short, combined with Tables 6-8 and the intuitive display of Figure 6, we can know that the proposed method FELM not only has good versatility and stability, but also has fast training speed.
,          The initial maximum number of neurons for OP-ELM is 100. This section also adds an ELM comparison with an activation function of RBF. The initial number of nodes of the ELM is 5, and the optimal number of hidden layer nodes is found at intervals of 10 nodes within 5-1000, and the optimal number of nodes obtained on each dataset is shown in Table 9. The table also shows the optimal parameter combination and support vector number of SVM and LSSVM on the problem, and the final number of neurons of OP-ELM.
The average results of 50 trials conducted by FELM and other models on the XOR dataset are shown in Table 9. The data in the table show that the performance of FELM is better than ELM, OP-ELM, SVM and LSSVM. Figure 8 shows the boundaries of different classifiers on the XOR problem. It can be seen that, similar to ELM, OP-ELM, SVM and LS-SVM, FELM can solve the XOR problem well.   The newly proposed FELM algorithm is compared with four other popular algorithms (ELM, OP-ELM, SVM and LSSVM) on four classification problems: Iris, WDBC, Diabetes and Wine. These four datasets are from UCI Machine Learning repository [70], and the number of samples, attributes and classes are shown in Table 10. The ELM algorithm sets the initial number of nodes to 5, and finds the optimal number of hidden layer nodes at intervals of 10 nodes within 5-1000. As shown in Table 11, we assign appropriate basic functions to these datasets for our FELM algorithm, and the optimal number of nodes obtained by the ELM algorithm on each dataset. The table also shows the optimal parameter combination and support vector number of SVM and LSSVM, the initial maximum number of neurons and the number of neurons after pruning of OP-ELM.

Realistic classification problems
The performance comparison between all algorithms is shown in  In the comparison of these five algorithms, obviously, better test results are given in bold. In Table 12, our FELM compared with four other algorithms on testing correct classification rate, and FELM achieved the highest test correct classification rate. The comparison results of the five algorithms for training and testing time are shown in Table 13. The learning speed of FELM is similar to the ELM, which is much faster than the OP-ELM, SVM and LSSVM. In Table 14, FELM is compared with other comparison algorithms on the standard deviation of testing correct classification rate, and the results show that FELM has good stability. Figure 9 shows the successful classification rate comparison of FELM and the other four algorithms running 50 times on four classification datasets. It can be seen that FELM obtains the highest number of higher classification rates. Compared with other algorithms, the curve fluctuation is smaller, indicating that its stability is better. In summary, combined with Tables 12-14 and Figure 9, it can be seen that FELM not only guarantees the learning speed in all cases, but also achieves better generalization performance.

Conclusions and future works
In this paper, we propose a new method for data regression and classification called functional extreme learning machine (FELM). Different from the traditional ELM, FELM is problem-driven rather than model-driven, without the concept of weight and bias. It uses the functional neuron as the basic unit, and uses functional equation solving theory to guide its modeling process. The functional neuron of the learning machine is represented by a linear combination of any linearly independent basic functions, and infinitely approximates the desired accuracy by adjusting the coefficients of the basic functions in the functional neuron. In addition, the parameter fast learning algorithm proposed in this paper does not need iteration and has high accuracy. Its learning process is different from the ELM used by people at present. It is expected to fundamentally overcome the shortcomings of the random initial parameters of the hidden layer (connection weights, bias values, number of nodes) in the current ELM theory that significantly affect the classification accuracy of ELM. Like ELM, FELM has less human intervention. It only needs to match the appropriate basic functions for the problem, and can obtain the optimal parameters according to the parameter learning algorithm without iteration. Simulation results show that compared with ELM, FELM has better performance and similar learning speed in regression and classification. Compared to SVM and LS-SVM, FELM can run stably with faster learning speed (up to several hundred times) while guaranteeing generalization performance. The proposed FELM theory provides a new idea for tapping the potential of extreme learning and broadening the application of extreme learning, which has important theoretical significance and broad application prospects. In the future work, we will use the parameter screening algorithm to further improve the generalization ability and stability of FELM and broaden its practical application range. These are the author's next works.