Arrangements of Resting State Electroencephalography as the Input to Convolutional Neural Network for Biometric Identification

Biometric is an important field that enables identification of an individual to access their sensitive information and asset. In recent years, electroencephalography- (EEG-) based biometrics have been popularly explored by researchers because EEG is able to distinct between two individuals. The literature reviews have shown that convolutional neural network (CNN) is one of the classification approaches that can avoid the complex stages of preprocessing, feature extraction, and feature selection. Therefore, CNN is suggested to be one of the efficient classifiers for biometric identification. Conventionally, input to CNN can be in image or matrix form. The objective of this paper is to explore the arrangement of EEG for CNN input to investigate the most suitable input arrangement of EEG towards the performance of EEG-based identification. EEG datasets that are used in this paper are resting state eyes open (REO) and resting state eyes close (REC) EEG. Six types of data arrangement are compared in this paper. They are matrix of amplitude versus time, matrix of energy versus time, matrix of amplitude versus time for rearranged channels, image of amplitude versus time, image of energy versus time, and image of amplitude versus time for rearranged channels. It was found that the matrix of amplitude versus time for each rearranged channels using the combination of REC and REO performed the best for biometric identification, achieving validation accuracy and test accuracy of 83.21% and 79.08%, respectively.


Introduction
Progress in information technology makes security a crucial aspect in protecting personal details and information. erefore, authentication is needed to allow the correct individual to access this information [1]. Authentication can be divided into three categories: the knowledge-based, tokenbased, and biometric-based approaches [2]. Knowledge-based approach depends on information that has been set by users, such as personal identification number (PIN) and textual password. Token-based approach uses an object that a person owns, such as smart card and passport. Knowledge-based and token-based approaches have disadvantages in which the identifiers may be forgotten, misplaced, or stolen [3]. Pass card and password that are stolen can cause losses in financial and intellectual property. For example, perpetrators of fraud may attempt to obtain sensitive information such as passwords and credit card details for malicious reasons. In order to overcome these issues, a biometric-based approach has been introduced [4].
Biometrics-based approach depends on a person's identity [5].
e approach enables authentication based on physiological or behavioral features to recognize an individual, which cannot be replicated or stolen. Biometrics can be further divided into two categories, which are conventional biometrics [6] and cognitive biometrics [7]. Conventional biometrics use the physiological properties of the individual, including fingerprint, palm print, gait, and iris scan. Behavior characteristics such as voice and signature are also parts of conventional biometrics. In contrast with conventional biometrics, cognitive biometrics measures the human brain signals during emotional and cognitive brain conditions. e recorded signals are used as biometric traits. Human brain plays an important role in controlling the actions and behavior of individuals. Every movement of our human body reflects the signals which are sent by the human brain [8]. In order to pick up signals from the human brain, different modalities are used to study the differences of brain signals to trigger various actions. Examples of brain signal representation that can be seen are functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). Nevertheless, EEG has higher temporal resolution [9] and is able to directly measure brain activity. erefore, EEG is a better option in cognitive biometrics as the procedures to obtain EEG are relatively practical and cheap. In addition, EEG is able to provide important information that can discriminate two individuals [10]. e analysis of EEG is more consistent as the signals recorded are more substantial and explicit [11]. Besides, EEG is universal, unique, and robust, making it suitable to be used as cognitive measures for biometric identification [12].
During the recording of EEG, certain tasks will be carried out by the subject in order to evoke responses from the human brain. e responses are picked up by the EEG. In recent years, resting state eyes closed (REC) and resting state eyes open (REO) are implemented in biometrics. Subjects closed their eyes when the EEG are recorded (i.e., REC), followed by eyes open (i.e., REO). Both REC and REO tasks are performed in resting state. REC and REO are commonly used as it is able to differentiate brain conditions of different individuals [13].
In the work by Choi et al. [14], they extracted features from the alpha activity of the EEG during REC and REO, as alpha power becomes stronger when eyes are closed [15,16].
eir experimental results showed that the spatiospectral patterns of changed alpha activity are different for individuals and can perform identification efficiently. In the work of omas et al. [17], sample entropy features are extracted from delta, theta, alpha, beta, and gamma bands of the REC and REO EEG. It was shown that beta band entropy has the highest intersubject variability. In their work, power spectral density (PSD) is concatenated with the entropy features and improves the performance. In the work of Suppiah et al. [18], PSD was also extracted from REC and REO EEG as features, and a Fischer linear discriminant classifier was trained to perform identification. In addition, Lee et al. [19] also extracted spectral power, maximum power, and frequency of maximum power in the alpha band from the REC EEG in their design of the biometric authentication system. Fraschini et al. [20] proposed an REC REO EEG-based biometric system, which makes use of eigenvector centrality. eir work reported that the resting state functional brain network provides a better classification than only using a measure of functional connectivity. eir report strengthens the usability of resting state EEG as a biometric measure. It is observed from previous studies that resting state EEG provides important information which can differentiate individuals. In addition, resting state EEG can be applied to individuals who are severely ill as it only required their EEG acquisition in their resting state for biometric confirmation. From the review of many works, feature extraction is a complicated task as important information has to be selected to represent the individuals.
Based on the literature, we found that the general framework for biometric identification is as shown in Figure 1, which can be divided into four stages. e first stage is the preprocessing stage, where the raw signal will be preprocessed in order to remove unwanted elements such as noise and artifacts. e second stage is the feature extraction stage. In this stage, features are extracted from the preprocessed signal and then consequently used to train a classifier. If the size of the feature set is too huge, the feature dimension reduction will also be carried out at this stage. In the third stage, the feature set extracted from the second stage will be used to train a classifier. In the fourth stage, the trained classifier is used to perform classification for the input EEG.
ere are several works that have been proposed for the EEG-based biometric authentication system using the framework in Figure 1. In the work of Koike-Akino et al. [21], blind source separation canonical correlation analysis (BSS-CCA) is used to preprocess the signal to remove ocular artifacts. Due to the huge dimensionality of their features, principal component analysis is used to rank and select the representative features from the feature pool. In their work, several machine learning classifiers were evaluated, and it was found that quadratic discriminant analysis (QDA) presents the best classification accuracy. He et al. [22] extract multivariate autoregressive (mAR) coefficients from multiple EEG channels as features. e extracted features are further hashed using the Fast Johnson-Lindenstrauss Transform-(FJLT-) based hashing algorithm to obtain compact hash vectors. e hash vectors are then used to train a Naive Bayes probabilistic model for identification.
Reshmi et al. [23] preprocessed the raw EEG signal using band pass filtering, baseline removal, detrending, and artifact removal. is is to remove unwanted components in the signal. Next, independent component analysis (ICA) is used to select the useful EEG components. e wavelet transform is used on the resultant signal to reveal the discriminative characteristic. e wavelet is used to train an artificial neural network (ANN) for identification. In the work by omas et al. [24], EEG signals are filtered using a zero-phase Butterworth filter and divided into five subfrequency bands. Entropy is computed from the resulting signals as features and used to train a Mahalanobis distance-based classifier.
e preprocessing is crucial to remove all the unwanted elements in a signal. However, it is time consuming to locate and remove the impurities in the signal. Impurities such as noises and artifacts can affect the training of classifiers and reduce the classification accuracy. In addition, determining the important features will also take up experimental time. In order to overcome the complex design of preprocessing, feature extraction and feature selection, CNN is one of the common methods used in development that requires classification [25].
CNN is a machine learning method which is inspired from the biological system [26]. e architecture is made up of multilayer perception (MLP) which consists of multiple hidden layers, combining the convolution layer and the conventional back propagation neural network dense layer. e hidden layers including the convolutional kernel in the CNN carry learnable parameters which require multiple iterations of learning and validation to determine the optimum value empirically [27]. e convolutional layers play the role of extracting important features from the input matrix through the weighted learnable kernels [28]. Each forward input of the matrix computes a feature map. e convolutional layers learn to activate the feature maps when the patterns of interest are detected in the input. Activated feature maps will be downsampled by using the pooling layer and further feed forward to the next layers. Fully connected layer (also known as dense layer) is trained using the feature map.
e learning process of the learnable parameters implies backpropagation [29] and gradient decent [30].
An example of CNN structure is shown in Figure 2. e first convolutional layer will extract the features to form feature map from the output, followed by the first pooling layer. e first pooling layer will pool the features together and direct them into the second convolutional layer. e second convolutional layer extracts features to form the second feature map. Consequently, the second pooling layer will pool the features and direct it to the fully connected layer. From the fully connected layer, classification will be made and the input will be categorized into their labeled classes.
In recent years, CNN is frequently used in an EEG-based identification task [31][32][33][34]. In these approaches, input EEG is arranged in matrix form of amplitude versus time for every channel of EEG. e input matrix of amplitude versus time for every channel is a direct and convenient way to prepare the signals that can save input data preparation time. In the work by Ma et al. [34], an EEG-based biometric recognition was developed using CNN. In their work, resting state EEG is used as the input of CNN.
Resting state EEG lacks task-related features; hence, it is hard to perform feature extraction. CNN is one of the best used methods, which does not require preprocessing and feature extraction beforehand in order to obtain the feature set for the training of the classifier. In the work of Ma et al. [34], the CNN architecture used is shallow, which can avoid the possibility of overfitting. At the same time, their framework presents a high degree of accuracy for identification.
However, the input matrix of amplitude versus time might not be distinctive enough to represent the information in the signal to train the CNN. Besides the direct input of the raw EEG, there is no implementation of batch normalization in the work of Ma et al. [34] despite batch normalization is shown to be able to improve the performance of CNN [29]. e objective of this paper is to investigate the most suitable input to represent the EEG for the training of CNN. Six methods of organizing the input EEG are compared using the same data set and CNN architecture as in the work by Ma et al. [34]. e paper is divided into four sections. Section 2 presents the methodology. Next, Section 3 presents the experimental results and discussion. Finally, the conclusion is given in Section 4.

Methodology
In this section, the methodology to investigate the most suitable input type of EEG and the effect of batch normalization are explained. In this work, six types of input of EEG are compared. e details of the input types will be explained in Section 2.3. In addition, the necessity of batch normalization is evaluated on all of the six input types. e input type of the work by Ma et al. [34] is used as the benchmark.
is section is divided into five subsections. Section 2.1 will discuss the CNN architecture used in this study. Section 2.2 presents the dataset used and the division of the dataset into training and testing sets. Section 2.3 explains the data preparation of EEG into six input types, and Section 2.4 is about batch normalization.

Convolutional Neural Network
Architecture. For this biometric identification system, we have chosen the CNN architecture by Ma et al. [34] because their design of CNN is shallow, yet presents a high identification accuracy. In the work by Ma et al. [34], there is no implementation of batch normalization despite batch normalization was shown to be able to improve the performance of CNN [29]. erefore, in this work, batch normalization will be included in the CNN architecture.
is CNN architecture is implemented to evaluate the performance of all six input types of EEG which will be further explained in Section 2.3. e CNN architecture is shown in Table 1 and Figure 3. e CNN is made up of five layers, including two convolutional layers, two pooling layers, and one fully connected layer. e input with size 64 × 160 will be directed to the first convolutional layer made up of six 5 × 5 filters, resulting an output of 60 × 156 × 6. e resulting feature map is then directed into an average pooling layer of size 2 × 2, producing an output of 30 × 78 × 6. e feature map is then directed to the second convolution layer made up of six 5 × 5 filters, outputting a feature map of 26 × 74 × 6. After going through an average pooling of size 2 × 2, an output of 13 × 37 × 6 is produced.
is output is then flattened and directed into the fully connected layer. Softmax function is used as the activation function.

Computational Intelligence and Neuroscience
In this study, there are seven parameters that are fixed for a fair evaluation. ese parameters are presented in Table 2. e learning rate of the CNN is set to 0.001 and remains constant throughout the training. In this study, the necessity of batch normalization is evaluated, and the result is shown in Section 3. L 2 is used for batch normalization. e mini-batch size is set at 100 for experiments using REC or REO data set, while 200 are used for the combination of REC + REO. is is to allow a fair comparison of forming equal amount of epoch for training. e training repetition is fixed with 30 repetitions for all experiments. L 2 regularization is used with the regularization factor of 0.0005. e optimizer used is the stochastic gradient decent with a momentum of 0.9.

Dataset.
In this study, both REO and REC data used are from an open source dataset available on the website [35], which is the same dataset used in the work by Ma et al. [34]. e signals from the data set are recorded using 64-channel BCI2000 system with a sampling rate of 160 Hz. e dataset is divided into training set, testing set, and validation set. A total of 109 subjects' data are used under both REO and REC  conditions. Recording time of the first 60 seconds is used for each condition. is is because the presence of more discriminating characteristics of the EEG is close to the beginning of the recording [36]. is signal is then divided into 60 subsets of one second each. After this division, 25 subsets will be used for training, five will be used for validation, and 30 will be used for testing. ere will be three sample sets used, which are REO, REC, and REO combined with REC. ere are a total of 2725 training samples, 545 validation samples, and 3270 testing samples for REC and REO, whereas for a combination of the REC and REO, it will have 5450 training samples, 1090 validation samples, and 6540 testing samples. In this study, subjects that are included in the dataset will be used for classification. Additional impostor subjects are not included.

Data Preparation.
ere are six types of input methods of the EEG compared in this study (Table 3). ree matrices and three images are compared. e three matrices are stored as images because images have manageable storage size.

Matrix of Amplitude versus Time, M 1 .
e first type of input is a matrix of amplitude versus time M 1 , as in the work of Ma et al. [34]. e arrangement of channels will use the default arrangement given by the dataset [35], as shown in Figure 4. e matrix size is N × F s , where N is the number of channels and F s is the sampling frequency. In this case, the matrix size will be 64 × 160 because the data are partitioned into one second segment, the number of channels is 64, and the sampling rate of EEG recording is 160 Hz. e components in the matrix are stored from the EEG by using the formula: where i is the channel of the sampling point, t is the time of the sampling point, and x i (t) is the amplitude of the sampling point of channel i at time t. e example for matrix M 1 is shown in Figure 5.

Matrix of Energy versus Time, M 2 .
e second type of input data is the matrix of time versus energy M 2 . is input data matrix can be seen in the work of Sakhavil et al. [37] of motor imagery classification using CNN but is never used yet for EEG biometric identification. e Hilbert transform is applied on the input signal x(t) and computes the instantaneous energy for each of the time points in one second of EEG. erefore, the matrix is 64 × 160. e Hilbert transform of the time series is calculated using the formula [38]: where x i (t) is the input time series and PV is the Cauchy principal value. Next, the analytic EEG signal z i (t) is build from the Hilbert transform using the formula [38]: where the real part of the signal x i (t) is the original value of the time point. e imaginary part jH(x i (t)) is the Hilbert transform of the time point. e energy for channel i at a certain time point t, E i,t , is computed using the formula: where H(x i (t)) is the Hilbert transformed signal. e computed energy is stored in matrix M 2 using the formula: e example for matrix M 2 is shown in Figure 6.

Matrix of Amplitude versus Time for Rearranged
Channels, M 3 . e third type of input data matrix is the rearrangement of channels according to the Pearson correlation coefficient, which was proposed by Wen et al. [39] for emotion recognition using CNN. Similar to M 2 , this data matrix has not been used in EEG biometric identification. e Pearson correlation coefficient is the statistical value of linear correlation between two variables [40]. In their application, it is used to evaluate the relevant information between two electrodes (known as channel). e value of the coefficient ranges from −1 to 1, showing the negative linear correlation and positive linear correlation. Value 0 means there is no correlation between two variables. e Pearson correlation coefficient is computed for EEG using the formula: where ρ xy is the computed coefficient, x i and y i are the two signals from different channels, x is the mean of signal x i , and y is the mean of signal y i . e rearrangement of channels started by placing channel one and the most correlated channel on the top most of the matrix. is step is then repeated for the rest of the channels, concatenating correlated channels next to each other. After the rearrangement, the amplitude of each sampling points is stored in matrix M 3 using equation (1). e matrix size will be 64 × 160. e example for matrix M 3 is shown in Figure 7.
where i is the channel of the sampling point, t is the time of the sampling point, M 1 max is the maximum value in M 1 , and M 1 min is the minimum value in M 1 . e example for image I 1 is shown in Figure 8.

Image of Energy versus Time, I 2 .
To form an image of matrix M 2 , the values in the matrix are scaled from 0 to 255 and stored in image I 2 using the formula: where i is the channel of the sampling point, t is the time of the sampling point, M 2 max is the maximum value in M 2 , and M 2 min is the minimum value in M 2 . e example for image I 2 is shown in Figure 9. Similarly to image I 2 , there are no existing works in EEG-based identification that computes energy from EEG and stored in image format. It is suggested that the image format is easier for data storage and management.

Image of Amplitude versus Time for Rearranged
Channels, I 3 . By referring to matrix M 3 , image I 3 is formed.
In the literature, there are no approaches in EEG-based identification that stored EEG matrices in image format. It is predictable that image of the EEG can ease researchers to manage the storage in a convenient way. e values in the matrix are scaled from 0 to 255 and stored in image I 3 using the formula: where i is the channel of the sampling point, t is the time of the sampling point, M 3 max is the maximum value in M 3 , and M 3 min is the minimum value in M 3 . e example for image I 3 is shown in Figure 10.

Batch Normalization.
In the work of LeCun et al. [29], batch normalization is introduced for network regularization [41]. In their work, batch normalization is applied to avoid overfitting of the network and to have better conservation of information throughout the training process. Firstly, mean μ β and variance σ 2 i are calculated for the inputs mini-batch x i to the batch normalization layer. Next, the mean and variance are used to calculate the normalized activations x i using the formula: where ϵ is a constant that improves numerical stability when the variance of the mini-batch is small. In order to allow inputs with zero mean and unit variance that are not optimal for the layer, the batch normalization layer shifts and scales the activations. e scaled activation z i is calculated using the formula: where β is the offset and c is the scale factor. Both the parameters are learnable during training of the CNN. When the training is done, mean and variance are calculated for the   Computational Intelligence and Neuroscience full training set and then stored. While predicting using the trained CNN, the trained mean and variance will be used to normalize the activation.
Experiments using six types of input data are repeated by adding batch normalization layers in between the convolution layer and pooling layer.

Results and Discussion
e performance of each type of input is measured using two identification accuracies in terms of percentage, which are the validation accuracy and testing accuracy. e identification accuracies are obtained using a threefold cross validation. Both of the identification accuracies are calculated using the formula: where C P is the correct prediction and N P is the number of prediction done. Each of the identification accuracies is tabulated. First, Table 4 shows the identification accuracy using the REC dataset. Among the input types, matrix M 1 showed the highest validation and test accuracies, which are 74.86% and 67.04%, respectively. e image I 1 presents validation and test accuracies which are slightly below matrix M 1 . is scenario shows that there was some information lost in between the conversion of matrix to an image. On the other hand, converting M 2 into image I 2 improved the validation and test accuracies. e values of power computed are large, costing complex calculation in the training of CNN. e conversion of matrix to image scaled the values into the range from 0 to 255, which normalized the larger values in the matrix.
At the same time, it was found that rearrangement of the channels according to the correlation is ineffective in representing the EEG. e implementation of image and matrix both shows validation and test accuracies lower than 40%. It is worth noting that channel reorganization alters the 2D spatial information presented by the matrix or the image. As our convolutional layers in CNN are using 2D filters, different features are extracted by these layers from the rearrangement of the channels, as compared to those extracted from the original EEG sequence. During the rearrangement of channels, important edges and patterns represented by the original EEG are lost, and there are no discriminating features that can be extracted during the training of CNN, thus causing low validation and testing accuracy. e identification accuracy using the REO dataset is tabulated in Table 5. From Table 5, matrix M 1 presented the highest validation and test accuracy. Mapping the original data from EEG into a matrix preserves the information for each individual. Similarly, image I 2 has higher validation and test accuracy than matrix M 2 due to the normalized values in the matrix-to-image conversion. Rearranging the channels of REO EEG shows similar validation and test accuracy with the REC, which is relatively low. Both M 3 and I 3 show validation and test accuracy lower than 30%. Table 6 presents the identification accuracy, using the dataset of REO combined with REC. In using the REO + REC dataset, matrix M 1 also presents the highest identification accuracy. It shows that storing raw EEG in matrix format effectively brings out the important information for the training of CNN. For the computation of power for the raw EEG, image I 2 outperformed matrix M 2 due to the normalization of the values. From the result, the implementation of rearranging the EEG channels is not effective, even with using the combination of REO and REC.
To have a better analysis of the best input type for each dataset, their identification accuracies are tabulated in Table 7. It can be seen that matrix M 1 shows the best performance among each dataset used. e raw EEG contains patterns and information which can discriminate each subject. Such a property makes EEG a suitable trait for biometrics. In comparison to using REC or REO dataset itself, the combination of both achieved a higher validation and test accuracy. e combination of the REC and REO provides extra information for the training of CNN, making the trained model to be more robust towards different individuals.
By converting matrix M 1 into an image, it can ease the dataset management by implementing into other existing structures of machine learning which are image-based implementation. From the results, image I 1 did not perform as well as matrix M 1 . However, in the use of the REO + REC dataset, image I 1 presented an elevated validation and test accuracy of 78.66% and 72.31%, respectively. It is suggested that implementing image I 1 using the REO + REC dataset for biometrics is acceptable for easier handling of large dimension datasets.

Conclusion
From the experiment, arranging raw EEG in the form of matrix presents the best identification accuracy by using the REO + REC dataset. Referring to this, it is noticeable that resting states eyes close and eyes open should be used  Processing image form of raw EEG provides advantages such as less storage space, easier storage management, and dataset sharing among researchers. In addition, it was found that by rearranging the channels according to their correlation removes patterns and information from the raw EEG, which is not suitable for the application of EEG biometrics. It was also found that scaling and storing the power computed from the raw EEG into image form can improve the identification accuracy. is is because huge power values are normalized when they are rescaled into a range from 0 to 255 in the conversion process. is study concludes that matrix form of raw REO + REC EEG is sufficient for a robust biometric identification system.

Data Availability
Previously reported EEG data were used to support this study and are available at https://www.physionet.org/ physiobank/database/eegmmidb. ese prior studies (and datasets) are cited at relevant places within the text as references [35]. e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Computational Intelligence and Neuroscience 9