Hypercomplex extreme learning machine with its application in multispectral palmprint recognition

An extreme learning machine (ELM) is a novel training method for single-hidden layer feedforward neural networks (SLFNs) in which the hidden nodes are randomly assigned and fixed without iterative tuning. ELMs have earned widespread global interest due to their fast learning speed, satisfactory generalization ability and ease of implementation. In this paper, we extend this theory to hypercomplex space and attempt to simultaneously consider multisource information using a hypercomplex representation. To illustrate the performance of the proposed hypercomplex extreme learning machine (HELM), we have applied this scheme to the task of multispectral palmprint recognition. Images from different spectral bands are utilized to construct the hypercomplex space. Extensive experiments conducted on the PolyU and CASIA multispectral databases demonstrate that the HELM scheme can achieve competitive results. The source code together with datasets involved in this paper can be available for free download at https://figshare.com/s/01aef7d48840afab9d6d.


Introduction
Nowadays, machine learning has been playing an increasingly significant role in our daily life and a variety of machine learning areas have attracted great interests from researchers. The neural network technology, as a kind of typical machine learning method, is proven to be a successful tool for artificial intelligence (AI). With the rapid development of computer hardware, deep neural network techniques also achieve a huge success in various kinds of recognition tasks [1,2].
Recently, a novel machine learning theory called extreme learning machine (ELM) was proposed by Huang et al. [3,4] and has aroused growing worldwide concerns. It is a machine learning method for single-hidden layer feedforward neural networks (SLFNs) that differs from the traditional back-propagation (BP) algorithm and its variants [5][6][7]. Unlike the BPbased learning approach, which employs an iterative process to tune the hidden nodes of an SLFN, the ELM method completes a training task without any repeated optimizing steps. An ELM randomly assigns the input weights and bias of an SLFN and analytically calculates the output weights by a simple Moore-Penrose generalized inverse. This algorithm can avoid many difficulties of conventional learning methods, such as the settings of stopping criteria, learning rates and learning epochs. ELM has been shown to be able to find a global optimal solution with excellent universal approximation ability and a very fast learning speed. The advantages of ELM-computation cost and generalization performance-render it one of the most popular machine learning methods, with extensive and successful applications in classification, regression, clustering, compression and feature learning problems [8][9][10][11][12].
In the past decade, a variety of ELM variants were proposed to address problems in the original theory; they have significantly enhanced the contributions of ELM to theoretical studies and engineering applications. For example, a regularized extreme learning machine [13] was investigated to solve the overfitting problem based on structural risk minimization principle and weighted least squares. In [14], a fast and accurate online sequential learning variant was developed and applied to gesture recognition and object tracking; it has shown excellent performance with regard to accuracy and computation time. In [15], the original ELM model was extended to complete classification and regression tasks with noisy or missing data. From the implementation aspect, a stacked ELM invariant [16] was designed to render it feasible for large data sets and real-time reasoning. In recent years, the ELM concept has been introduced in multilayer perception [17,18]. As demonstrated in [17], compared with the greedy layerwise training of deep learning, the ELM-based framework has a substantially better learning efficiency.
Although ELM has been successfully applied in an extensive range of domains, it is primarily utilized for single-source information classification. In the case of multisource features, a fusion operation must be performed either at the feature level or the matching score level [19][20][21] to achieve a final result. For feature-level fusion, multisource features are simply jointed. The input hidden nodes of ELM are adjusted to the dimension of the joint features, which generates a very large computational cost. Regarding matching score level fusion, features from each channel are calculated to separately obtain a matching score by ELM; these scores are fused for the final decision. This strategy merely considers the matching score information that has lost some discriminative features. Therefore, the high accuracy of the fusion at the feature level cannot be achieved. In this paper, we propose a hypercomplex extreme learning machine (HELM) from a different perspective for the classification of multisource information. A hypercomplex representation [22,23] is introduced in the ELM theory. Multisource features are employed to construct the hypercomplex space, and hypercomplex operation rules are applied to determine the output weights of SLFNs. In addition, a fusion strategy is performed on the hypercomplex output nodes to obtain a decision.
As a typical kind of multisource information processing problem, multispectral palmprint recognition has gained widespread attentions in recent years. Some previous works tried to process the multisource information using a fusion operation. For example, Lu et al. [24] completed an illumination-invariant palmprint recognition system by fusing the multispectral images at image level, in which a FABEMD+WFC fusion framework was developed. Similarly, Xu et al. [25] fused the multispectral images using a digital shearlet transform based method and then classified the fused images with the extreme learning machine. Gumaei et al. [26] proposed a kind of Gabor-based feature extraction method and employed the optimal spectral band to determine the identities. The same authors [27] further utilized a hybrid feature extraction method named HOG-SGF instead of Gabor-based one to represent the multispectral palmprint images. Recently, Gumaei et al. [28] developed a new anti-spoof multispectral biometric cloud-based identification approach for privacy and security of cloud computing, in which a tree-complex wavelet transform was applied to complete the multispectral fusion task and Gabor features were used to represent the fusion images. Different from a fusion view, in this paper we try to address the multispectral palmprint recognition problem by the proposed HELM framework, in which the fusion stage could be circumvented. To evaluate the performance of the proposed method, we conduct some experiments using the PolyU and CASIA multispectral palmprint databases [29][30][31][32][33]. Palmprint images from multispectral bands are employed to construct the hypercomplex representation.
The remainder of this paper is organized as follows: Section 2 provides a brief review of the ELM, describes the HELM theory and introduces the application of HELM in multispectral palmprint recognition. Section 3 illustrates the experimental results of the proposed method, which is tested using the PolyU and CASIA multispectral palmprint databases. Some concluding remarks are provided in the last section.

Extreme learning machine
The ELM is a novel learning method for SLFNs that randomly assigns the hidden layer and analytically determines the output weights of SLFNs. For N distinct training data {x i ,t i }, i = 1,2,� � �,N, x i is a 1×n input vector, and t i is a 1×m output vector with only one entry (correspond to the class to which x i belongs) equal to one. n is the dimension of the input data, and m is the number of classes. To train an SLFN withÑ hidden nodes, the appropriate input weight vectors α j ; j ¼ 1; 2; � � � ;Ñ and output weight vectors β j ; j ¼ 1; 2; � � � ;Ñ are required, such that where α j is a (n+1)×1 vector that connects the input nodes to the jth hidden node, β j is a 1×m vector that connects the jth hidden node to the output nodes, x e i is the augmenting vector of x i with the format of [x i ,1]2R n+1 , and g(x) is the activation function.
This formula can be compactly written as Generally, a typical ELM training process consists of two main steps. The first step is to calculate the hidden layer output matrix with the random map α and a nonlinear piecewise continuous function, such as the following sigmoid function, sin function and atan function: 1. Sigmoid function: 2. Sin function: 3. Atan function: where H(i,j) is the value of H at the position (i, j).
A remarkable characteristic of ELM is that the input weight matrix α of the hidden nodes can be randomly generated according to any continuous probability distribution, for example, the uniform distribution on [-1,1]. ELM distinctly differs from conventional feedforward neural networks. As demonstrated by Eq (2), the only parameters that need to be optimized in the training process are the output weights β ¼ �m between the hidden nodes and the output nodes. Mathematically, training an SLFN by an ELM can be transformed into solving a regularized least squares problem, as illustrated in Eq (2). Additional iterative steps are not required to tune the parameters of SLFNs, which are significantly more efficient than BP-like algorithms.
In the second step, ELM attempts to determine the output weights by minimizing the following loss function: Huang et al. [4] proved that if the activation function g is infinitely differentiable, for N ¼Ñ arbitrary distinct samples {x i ,t i },i = 1,2,� � �,N, for any randomly assigned α according to any continuous probability distribution, the hidden layer output matrix H is invertible and kHβ −Tk = 0. Thus, the output weight matrix β can be calculated by where H −1 denotes the inverse matrix of H.
In most cases, the numberÑ of hidden nodes is significantly less than the number N of distinct training samples, H is a non-square matrix, and an inverse matrix for H does not exist. Huang has provided another method for finding the smallest norm least squares solution of Eq (2), that is, where H T is the transpose of H, C is a penalty coefficient, and I is an identity matrix with the sizeÑ �Ñ .
Here, the procedures for training an SLFN using ELM theory are as follows: Step 1: Initialize the numberÑ of hidden nodes; note thatÑ � N.
Step 2: Select the suitable activation function g.
Step 3: Randomly assign the input weight matrix α.
Step 4: Construct the output matrix H of the hidden layer.
Step 5: Calculate the output weight matrix β.

Hypercomplex extreme learning machine
To classify the multisource patterns, an invariant of ELM-Hypercomplex extreme learning machine-is presented. Instead of using a fusion strategy to combine the multisource information, HELM is built on hypercomplex representation, by which the model converts the multisource features into a hypercomplex space. It circumvents the process of designing fusion rules and therefore avoids the interference due to someone's limited knowledge. In addition, HELM takes advantages of all multispectral images and learns the model parameters adaptively according to training data. Thus HELM could be more efficient and accurate than the fusionbased strategy. To elaborate on the HELM model, initially we must introduce some basic concepts of hypercomplex operation. Mathematically, a hypercomplex number is a linear combination of a real scalar and the fixed number d of imaginary units: where y (1) ,y (2) ,� � �,y (d+1) are real numbers, and e 1 ,e 2 ,� � �,e d are the imaginary units. They have the following relationship: y � denotes the conjugate of y and is calculated by The norm of a hypercomplex number is defined as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi HELM aims to extend the extreme learning theory to hypercomplex space. In the case of classification of multisource features, HELM utilizes each type of feature to construct a hypercomplex matrix. Then, the weights of an SLFN are analytically determined with the hypercomplex operation rules. Fig 1 shows the structure of the proposed HELM network, which primarily consists of four key stages: mapping the multisource features into the hidden layer using the randomly generated real input weights, constructing the hypercomplex hidden layer output matrix, calculating the hypercomplex output weight matrix, and performing a fusion strategy on the output nodes to achieve a final decision.
For N distinct training samples with multisource features, i.e., {x i 1×n denotes the jth attribute of sample i, the core task for HELM is to determine the input weights and output weights of an SLFN. Similar to the settings in the ELM, we take a randomly generated map as the input weights. Each attribute of the training samples is mapped into the hidden layer as is the real input weight matrix for the jth attribute. The input layer and the hidden layer are connected with the set of real input weight matrices α (j) , j = 1,2,� � �,d+1.
A hypercomplex output matrix of the hidden layer is constructed using a hypercomplex representation, that is, For sample i, different attributes share the same output vector t i . Thus, the hypercomplex output vector can be constructed as This vector can be compactly described as where T (j) denotes the output matrix for the jth attribute, and is the hypercomplex output matrix of an SLFN.
Having obtained and , we can solve the hypercomplex output weight matrix according to the following equations: where is the hypercomplex transposition-conjugate matrix of . is the hypercomplex identity matrix with the size ofÑ �Ñ .
is the hypercomplex matrix inversion and is calculated by a blockwise recursion process described as where , , and are the hypercomplex matrix sub-blocks of arbitrary size. To be inverted, must be square. If the number of the rows (or columns) of exceeds one, can be recursively solved by Eq (19). When is scalar, is computed by where is the conjugate of , and is the norm of . The same procedures are performed in the calculation of . With Eq (18), the output weight matrix of the hidden layer is obtained. The hidden layer and the output layer are connected with hypercomplex weights. The input weights α (j) , j = 1,2,� � �,d+1 and the output weights of an SLFN are determined. The input weights are a series of real matrices, and the output weights are represented using a hypercomplex matrix.
Once the training process is completed, a sum rule-based fusion strategy is performed on the hypercomplex output nodes, which considers the information from multisource features. Let denote the hypercomplex output of a HELM network for a new sample with multisource features {nx (1) ,nx (2) ,� � �,nx (d+1) }. The final fusion result can be achieved by fðjÞ ¼ X dþ1 k¼1 nt ðkÞ ðjÞ À min t ðkÞ ðjÞ max t ðkÞ ðjÞ À min t ðkÞ ðjÞ where f(j) denotes the jth element of the fusion result f2R 1×m . nt (k) (j) denotes the jth element of nt (k) . min_t (k) (j) and max_t ( . t i (k) is the HELM network output of the training sample i for the kth attribute.
Here, the procedures for training and testing an SLFN using HELM theory are as follows: Step 1: Initialize the numberÑ of hidden nodes; note thatÑ � N.
Step 2: Select the suitable activation function g.
Step 4: Construct the hypercomplex output matrix H of the hidden layer.
Step 5: Calculate the hypercomplex output weight matrix .
Step 6: Obtain the fusion result using a sum rule.

Multispectral palmprint recognition using HELM
To evaluate the performance of the proposed HELM network, we have applied it to multispectral palmprint recognition. Images captured from different spectral bands are taken as the multisource features. Fig 2 demonstrates a multispectral palmprint sample. Before using the HELM network to classify the multispectral palmprint images, the intensity normalization process illustrated in Eq (24), must be implemented on the palmprint images to remove the global intensity influence.
where I(x,y) denotes the pixel value of image I at position (x,y). min_v and max_v are the minimum and maximum, respectively, of all pixels in I. With all training data, we can follow the steps described in HELM theory to train an SLFN and complete the palmprint recognition task.

Experimental results and performance analysis
In this section, we present the experimental results and assess the performance of the proposed HELM method. All experiments have been conducted on a computer with a 2.50 GHz Intel core processor and 8 GB memory. MATLAB 2017a was utilized as the simulation software.

Database description and evaluation criteria
To demonstrate the effectiveness of the proposed method, we conducted a series of experiments using the following two public multispectral palmprint databases.
The PolyU database is [29][30][31][32] created by Hong Kong Polytechnic University. The database consists of 24000 plamprint images collected from 250 volunteers, who comprised 195 males and 55 females. The age of each volunteer ranged from 20 to 60 years old. During the acquisition process, each volunteer was sampled 12 times in two separate sessions for his/her left and right palms. The palmprint images were acquired at four spectral bands, i.e., Red, Green, Blue and NIR. For the convenience of researchers, the Hong Kong Polytechnic University provides the region of interest (ROI) images with the size 128×128. The CASIA database [33] is provided by the Chinese Academy of Sciences' Institute of Automation. It has 7200 palmprint images in total collected from 100 volunteers. The  acquisition was performed in two separate sessions with a minimum time interval of one month. In one session, each volunteer was required to provide 3 samples for his/her left and right palm respectively. Each sample was captured at 460nm, 630nm, 700nm, 850nm, 940nm and white light (WHT) spectral bands respectively. Fig 6 shows some multispectral palmprint images in the CASIA database.
The performance of the proposed method is evaluated in terms of recognition accuracy and computational cost. In the recognition process, a certain number of multispectral palmprint images are treated as the testing samples. If the determined class label of one testing sample is the same with its actual label, it is considered as a correctly recognized sample. Otherwise, it is an incorrectly recognized one. Then the recognition accuracy is defined as where N c is the number of the correctly recognized samples in the testing group. N is the number of the samples in the testing group. The computational cost including the training and testing time is also used to compare the performance of the method. The training time is referred to as the time cost for constructing the HELM model using the training data. And the testing time is the time cost for determining the class labels of the testing samples using the trained HELM model.

Result analysis of the proposed method
To complete the training of the HELM network and achieve excellent generalization performance, the penalty coefficient C and the numberÑ of hidden nodes need to be appropriately It is clear from the figure that no matter which database the experiments are conducted on, for a given penalty coefficient C the recognition accuracy has an increasing trend as the numberÑ of hidden nodes progressively increases. It converges to the optimal accuracy when the value ofÑ is sufficiently large. We observe that a smaller value of C yields a higher recognition accuracy. When C is zero, the best results are generated.
Considering the randomness of the HELM training method, ten repeated experiments were performed. We also tested the performance of the HELM network with different activation functions. The sigmoid, sin and atan functions were compared to determine which function can achieve the optimal result. Table 1 lists the recognition accuracies in the ten repetitions.
Based on these repeating results, Table 2 gives the statistical comparison by using the onesided Two-sample Student T-test. It makes the hypothesis that two independent samples come  from normal distributions with equal means and variances. The value of test statistic can be calculated as: X 2 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where � X 1 and � X 2 denote the means of the two series of measurements, S 2 1 and S 2 2 denote the corresponding variances, n 1 and n 2 are the numbers of measurements in each series. The number of degrees of freedom for the Two-sample T-test is n 1 +n 2 −2. To complete this statistical comparison, the built-in Matlab function of "ttest2(x, y, α, 'right')" is used. Here, x and y denote the two series of measurements. α is the significance level. 'right' denotes the rightsided test.
In Table 2, t denotes the value of the test statistic, p denotes the probability of observing the given result if the null hypothesis H 0 is true and T-test denotes the test result. The significance level α is set to be α = 0.05. By making the comparison between the two activation functions of sigmoid and sin, it can be found that the p values for PolyU and CASIA databases are much less than the significance level α. Thus the null hypothesis H 0 is rejected and the alternative hypothesis H 1 is accepted, meaning that the sigmoid function can produce higher recognition accuracy than the sin function. Similarly, the inference that the sigmoid function outperforms the atan function can be obtained by making the T-test between the two functions of sigmoid and atan. As for the comparison between sin and atan functions, two different test results are achieved. The performance of the atan function is not significantly better than sin function for the CASIA database. It can be concluded that for the two databases, the sigmoid function consistently produces the highest recognition accuracy among the three types of activation functions. In the ten repeated measurements for the sigmoid function, the best recognition accuracies for the PolyU database and CASIA database are 100% and 98.50% respectively.
To evaluate the performance of the HELM network in the case of different combinations of input spectral bands, a series of experiments were conducted using different hypercomplex representations, i.e., y = y (1) +y (2) e 1 , y = y (1) +y (2) e 1 +y (3) e 2 , y = y (1) +y (2) e 1 +y (3) e 2 +y (4) e 3 or y = y (1) +y (2) e 1 +y (3) e 2 +y (4) e 3 +y (5) e 4 . In the situation with a single input spectral band, the HELM degraded into a traditional ELM network. Considering the randomness in the training of the HELM network, ten repeated runs were performed. The average accuracy and the corresponding standard deviation were employed as the assessment. We also applied three activation functions in the network. Tables 3 and 4 illustrate the experimental results of different spectral combinations when testing the HELM method on the PolyU and CASIA multispectral databases. As shown in the Table 3, the recognition accuracies of the experiments with more than one input spectral band are higher than the recognition accuracies of the experiments with a single spectral band. The recognition results based on the HELM network are satisfied when the number of input spectral bands is two or three, which has proven that the proposed HELM network is applicable for any spectral band combination. We also observe that the sigmoid function achieves the optimal results among the three activation functions when the number of input spectral bands is two or three. Regarding the case of a single spectral band, the Atan function obtains the best results for the ELM model. Similarly, we can also obtain these conclusions from Table 4. For the CASIA database, using the proposed HELM model with multispectral palmprint images can obviously improve the performance of palmprint recognition. The sigmoid activation function provides the optimal recognition accuracies for these experiments with more than one input spectral band. HELM employs a hypercomplex representation to complete the classification task of multisource features. To verify its effectiveness, a comparison was performed with two different strategies for ELM to process the multisource features, i.e., fusing the multisource features either at a feature level or a matching score level. These methods were compared in terms of computational cost and recognition accuracy. Similarly, ten repeated runs were performed. The time and accuracy were employed in the assessment. As reported in Table 5, we conclude that for either benchmark database, the hypercomplex representation-based method obtains the highest recognition accuracy and maintains a distinct advantage over the other two strategies. Regarding computation time, although the hypercomplex representation based-method cannot compete with the feature level fusion based method in terms of testing time, it requires the lowest training time and provides the highest recognition accuracy. The hypercomplex representation strategy outperforms both of the comparison methods.
A comparison was made with some state-of-art multispectral palmprint recognition methods, including two image level fusion methods, two matching score level fusion methods, a QPCA+QDWT method and two improved ELM-based methods. In addition, we also investigated the performance of the HELM model when using different features as the input. A dimensionality reduction method and a texture feature extraction method-PCA [34] and LBP [35]-were employed to extract the palmprint features. The experiments were conducted on the pure PolyU and CASIA databases as well as the corresponding manually generated ones by introducing different kinds of noises. Fig 8 demonstrates the manually generated palmprint samples used in these experiments. The Gaussian white noise with mean 0 and standard deviation 36, the Salt & Pepper noise with 10% noise density and the Speckle noise with variance 0.05 were utilized respectively to generate the noisy palmprint images. Table 6 lists the recognition accuracies of the comparison methods. We can discover that the HELM-based and PCA+HELM-based multispectral palmprint recognition methods consistently outperform the fusion-related methods and the QPCA+QDWT method on the testing databases. Although the two improved ELM-based methods could achieve quite satisfactory results on the PolyU database, the performance degrades when they are tested on the CASIA database. The LBP +HELM method could produce the highest recognition accuracies on the pure PolyU and CASIA databases (100% and 99.83). However, the recognition accuracies decrease seriously when the images are corrupted with noises. This is because the LBP features depend on the local structure of images and are very sensitive to the variation of pixel value. Table 7 gives the statistical comparison of the ten methods in Table 6 by using the onesided Two-sample Student T-test. The significance level is set to be α = 0.05. Here, the meanings of t, p and T-test are same with those in Table 2 and the value of t is calculated as shown in Eq (26). IPCA denotes the method of "Image level fusion by PCA". MPCA denotes the method of "Matching score level fusion by PCA". IDWT denotes the method of "Image level fusion by DWT". MDWT denotes the method of "Matching score level fusion by DWT". QPCA denotes the method of "QPCA+QDWT". By making the comparison between HELM method (Or PCA+HELM method) with the fusion-related methods, the QPCA+QDWT method and the improved ELM-based methods, we can find that the values of p are obviously less than the significance level α. Therefore the alternative hypothesis H 1 is accepted. That is to say, the HELM method (Or PCA+HELM method) could achieve higher recognition accuracies than the comparison methods from a statistical viewpoint. In addition, it is observed that the LBP+HELM method is not significantly better than the comparison methods due to the noise effect. As for the Student T-test between the three HELM-related methods, the test results show that the PCA+HELM method can produce the highest recognition accuracies.

Conclusions
In this paper, we have proposed HELM, which is a novel learning method for SLFNs. HELM introduces the hypercomplex representation concept into ELM theory. In contrast to the  conventional ELM model, the proposed method maintains all merits of ELM, such as fast learning speed, excellent generalization ability and ease of implementation. HELM can easily complete the classification task of multisource features by benefitting from the hypercomplex representation. We have applied this method to the task of multispectral palmprint recognition to verify the actual performance. Comprehensive experiments carried out on the PolyU and CASIA multispectral palmprint databases have demonstrated that the proposed HELM network can obtain favorable results compared with several state-of-the-art multispectral palmprint recognition methods.