EEG-based personal identification method using unsupervised feature extraction and its robustness against intra-subject variability

Takashi Nishimoto; Hiroshi Higashi; Hiroshi Morioka; Shin Ishii

doi:10.1088/1741-2552/ab6d89

1. Introduction

To overcome the limitations of traditional personal authentication methods, such as the use of keys or passwords, which can be shared, duplicated, lost, or stolen, biometric authentication methods have attracted increased attention in recent years. Various biometric features have been examined to develop safer and more convenient authentication methods [1]. Ideal physical features for use in biometric personal identification include fingerprints [2], irises [3], palms [4], and faces [5], as well as behavioral features such as voice [6] and signatures [7]. While some of these have already been put into practical use, they are still prone to forgery.

Here, we focus on the use of brain signals as potential biometric features. Structurally and functionally, brain signal biometrics offer a higher degree of uniqueness than the physical features described above. Some studies have investigated personal identification using functional magnetic resonance imaging (fMRI) [8] and electroencephalography (EEG) [9–14]. Among the various brain activity measurements, EEG provides a reasonable and practical modality for real-world personal identification purposes [15]. Previous personal identification methods using EEG exploited brain activity to visual stimuli [9], in resting state [10–12], in reminding digits [13], and to rapid serial visual presentation [14]. The EEG features they extracted were spectral features [9, 16, 17], temporal features, correlation between different electrode's signals [10], functional spatial connectivity [11], convolutional neural networks-based feature [12], spatial patterns of alpha and beta bands [13], and event-related potentials [14].

One of the most critical problems facing EEG-based personal identification is the variability of measured signals. This variability is due to the variation in the measurement environments and subject conditions [18]. Variability in the measurement environments includes the displacement of electrodes and impedance fluctuations, which depend not only on the conditions of the cap, scalp, and conduction gel but also on environmental conditions like temperature and humidity. Additionally, there is variability in subject conditions, mainly physiological, such as metabolism and sweating. That is, today's metabolic state may be different from yesterday's metabolic state. Such variability degrades the performance of personal identification using EEG-based signatures. However, many prior EEG-based personal identification studies have disregarded the influence of such variability [9–13].

In the present study, we investigated this influence and proposed a robust personal identification method against the EEG signal variability. To evaluate the influence of variability, we measured EEG signals in twenty subjects across four rounds (morning and afternoon for 2 d). At the onset of each round, we reinstalled an EEG set, including an EEG cap, onto the subjects' scalps. To extract the personal features that were invariant across the rounds, we proposed the use of unsupervised learning methods using common dictionary learning [19] and t-distributed stochastic neighbor embedding (t-SNE) [20]. Unsupervised learning is suitable especially when the data dimensionality is much larger than the sample number, like in our case. Specifically, common dictionary learning has been previously applied successfully in the removal of variability-inducing factors in task decoding [19].

Using the collected data and the proposed methods, we evaluated the performances in two problems. First, we classified EEG signals into personal labels (the personal identification problem). In this problem, we sought to identify an individual among people registered in an authentication system based on EEG signatures. Second, we classified signals into those from known and unknown people, i.e. whether a target person is registered in the current system or not (the unknown detection problem). To evaluate the influence of the variability on these problems, we compared two experiment settings. In the first setting, the test data belonged to the same round, but different sessions, as that of the training data (Setting SR). On the other hand, the test data in the second setting belonged to a different round from that of the training data (Setting DR).

When considering the practicality of a personal identification system, reduction of user's efforts is very important. In many existing EEG-based personal identification studies, personal features were acquired from task-related brain activity signals [9, 10]. The measurement of task-related brain activities requires users to perform a given task, which imposes the need for the users to exert non-negligible efforts in the (daily) authentication scenario. On the other hand, personal identification based on resting brain activities is much more practical, but can be accompanied by difficulty because non-controlled conditions could cause non-negligible variability in the brain signals. In this study, then, we propose to use task session data to train personal classifiers (registration) and resting session data for testing (identification). This is realistic because users need some efforts in the registration, which occurs once, but then require less efforts in the identification, which would occur multiple times, even daily.

Based on the problems and settings mentioned above, we examined the performances by considering how they were affected by non-subject-specific variability due to round, facility, and task setting. Brain activity signals likely vary due to factors other than those stemming from the environments and subjects. Therefore, the present study shows that such variability should not be ignored and suggests the importance of robust feature extraction methods against the variability for real-world personal authentication.

2. Methods

2.1. Aim and assumptions

One objective of the present study was to evaluate how personal EEG signal features are impacted by the variability caused by the measurement environment and subject conditions. To meet this objective, we obtained EEG data, which may contain variability coming from the displacement of electrodes, impedance fluctuations, and changes in physiological states. We defined a single series of measurements with relatively low variability as a 'round' and each measurement in a round as a 'session' (see section 2.3 and figure 1 for detail).

**Figure 1.** Experimental timeline. We measured signals across four rounds (daily morning and afternoon rounds for 2 d) per subject. Each round consisted of six task sessions and two resting sessions, of which one preceded the task sessions and the other followed the task sessions. During the resting sessions, resting-state EEG activity was measured for 5–10 min. During the task sessions, the subjects performed a selective visual spatial attention task (attend-leftwards or attend-rightwards) [25]. The task sessions consisted of 24 trials. A trial contained three epochs: rest (6–10 s), Preparation (4 s), and Attention (8 s). During the Preparation and Attention epochs, white flashing bars were repeatedly presented in a rapid stream on the left and right of a central fixation cross. These flashing bars were presented for 100 ms, followed by an inter-stimulus interval (600–800 ms), during which no bars were presented. Bar orientations were chosen randomly from $-30, 0, 30^\circ$ , with equal probability. During the Preparation epoch, the subjects were instructed to distribute their attention evenly between the two bars, which were both white. During the following Attention epoch, the subjects were instructed to orient their attention to a single bar, the direction (left or right) of which was informed by the color (red or green) of the bar presented at the onset of the Attention epoch. To determine whether the subjects continuously attended towards the direction cued, they were asked to press a button immediately only after the target bar was vertical ( $0^\circ$ ). In our personal identification system, we did not use EEG signals for the Rest and Preparation epochs; that is, task session data (in the main text) included signals only for the Attention epochs. Figure reprinted from [25] with permission from Elsevier.
Download figure:
Standard image High-resolution image

**Figure 1.** Experimental timeline. We measured signals across four rounds (daily morning and afternoon rounds for 2 d) per subject. Each round consisted of six task sessions and two resting sessions, of which one preceded the task sessions and the other followed the task sessions. During the resting sessions, resting-state EEG activity was measured for 5–10 min. During the task sessions, the subjects performed a selective visual spatial attention task (attend-leftwards or attend-rightwards) [25]. The task sessions consisted of 24 trials. A trial contained three epochs: rest (6–10 s), Preparation (4 s), and Attention (8 s). During the Preparation and Attention epochs, white flashing bars were repeatedly presented in a rapid stream on the left and right of a central fixation cross. These flashing bars were presented for 100 ms, followed by an inter-stimulus interval (600–800 ms), during which no bars were presented. Bar orientations were chosen randomly from $-30, 0, 30^\circ$ , with equal probability. During the Preparation epoch, the subjects were instructed to distribute their attention evenly between the two bars, which were both white. During the following Attention epoch, the subjects were instructed to orient their attention to a single bar, the direction (left or right) of which was informed by the color (red or green) of the bar presented at the onset of the Attention epoch. To determine whether the subjects continuously attended towards the direction cued, they were asked to press a button immediately only after the target bar was vertical ( $0^\circ$ ). In our personal identification system, we did not use EEG signals for the Rest and Preparation epochs; that is, task session data (in the main text) included signals only for the Attention epochs. Figure reprinted from [25] with permission from Elsevier.
Download figure:
Standard image High-resolution image

Our proposed method for feature extraction is based on the following assumptions:

(A1)
At each time point, the spatial brain activity pattern is expressed as a combination of a small number of spatial bases that are common across all subjects, rounds, and sessions [21].
(A2)
Actual measured brain activity is deformed by spatial transformations specific to each subject, round, and session.
(A3)
Spatial transformations of data from the same subject and round are consistent across sessions.

We applied common dictionary learning to feature extraction, based on the first and second assumptions, and adopted spatial transformations as personal features based on the third assumption.

2.2. Common dictionary learning

In the basic formulation of dictionary learning, a vector of the measured signals $x_t \in {{\mathbb R}}^M$ at time t is represented by $x_t \approx D \alpha_t$ , where $D = [d_1, ... , d_K] \in {{\mathbb R}}^{M \times K}$ is a dictionary whose column vectors d_k are called atoms and $\alpha_t \in {{\mathbb R}}^K$ is called a sparse code.

We used an 'overcomplete' dictionary (K > M). Dictionary learning with overcomplete bases makes it possible to decompose signals into sparse factors, and has been applied to signal processing functions such as noise removal [22] and compressive sensing [23].

In the present study, $x_t \in {{\mathbb R}}^M$ represents the signal data via a time-frequency analysis of measured brain activities around time t; see below for details. The dimension M corresponds to the number of electrodes (channels). $D \in {{\mathbb R}}^{M \times K}$ is interpreted as the bases of spatial activity patterns (spatial bases), while $\alpha_t \in {{\mathbb R}}^K$ are the weights to combine the spatial bases. The weights are estimated as sparse; see below. Finally, the use of an overcomplete dictionary enabled us to analyze our EEG data based on the first assumption (A1, above).

In a basic application of dictionary learning, as described above, signal variation x_t must be in a linear combination of the bases in the dictionary. Measured signals contain various kinds of features: mental features, which vary from task to task, physical features, which vary across subjects, and variations due to measurement-environment conditions and other physiological conditions. Given this, a dictionary should cover all spatial bases considering these features. As a result, the dictionary can become too redundant and lack generalizability, due to the variations within subjects, rounds, and sessions. Thus, this formulation does not satisfy the second assumption A2.

Therefore, we applied common dictionary learning [19]. Here, spatial transformations are specific to subjects, rounds, and sessions. That is, measured signals are given by:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:commonDL1} x_{irjt} \approx Z_{irj} D \alpha_{irjt} , \nonumber \end{align} \tag{ 1 }$

where $i,r,j,t$ are the indices for subjects, rounds, sessions, and time points, respectively, $x_{irjt} \in {{\mathbb R}}^M$ is a vector of the measured signals, $Z_{irj} \in {{\mathbb R}}^{M \times M}$ is a matrix of spatial transformation, $D \in {{\mathbb R}}^{M \times K}$ is a dictionary, and $\alpha_{irjt} \in {{\mathbb R}}^K$ is a sparse code. Whereas the dictionary D is unique and shared across subjects, rounds, and sessions, each spatial transformation Z_irj is specific to a subject-round-session set $(i,r,j)$ . Each spatial transformation was further assumed to be consistent during a given session. We can interpret the measured signal x_irjt to be represented as a linear combination of a small number of bases in the deformed dictionary Z_irjD, which is now specific to the subject-round-session set $(i,r,j)$ , and the sparse code as a total representation of their weights.

Since the combination of Z_irj and D in equation (1) is under-determined, the following constrained optimization problem is introduced:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \label{eq:commonDL2} &\min_{Z,D,A} \frac{1}{2} \sum_{i \in {{\mathcal S}}} \sum_{r \in {{\mathcal S}}_i} \sum_{j \in {{\mathcal S}}_{ir}} \sum_{t \in {{\mathcal S}}_{irj}} \{\|x_{irjt} - Z_{irj} D \alpha_{irjt}\|_2^2 + \lambda \|Z_{irj} - I\|_{\rm F}^2\}\nonumber \\ &{{\rm s.t.~}} \forall i,r,j,t : \|\alpha_{irjt}\|_0 \leqslant L, \forall k \in \{1,\cdots,K\} : \|d_k\|_2 = 1 , \nonumber \end{align} \tag{ 2 }$

where $i \in {{\mathcal S}}$ , $r \in {{\mathcal S}}_i$ , $j \in {{\mathcal S}}_{ir}$ , and $t \in {{\mathcal S}}_{irj}$ are indexing subjects, rounds performed by subject i, sessions in rounds r performed by subject i, and time points in session j in round r performed by subject i, respectively. Z = [Z_irj] and $A=[\alpha_{irjt}]$ are spatial transformations and sparse codes, respectively. $\lambda \geqslant 0$ is a regularization constant, $\|\cdot\|_p$ is the l_p norm, and $\|\cdot\|_{\rm F}$ is the Frobenius norm. Each sparse code $\alpha_{irjt} \in {{\mathbb R}}^K$ is constrained to have L or fewer nonzero elements; this is the reason why it is called sparse code. The regularization term in equation (2) also plays a role in rendering each Z_irj so that it is not extensively distorted from a unit matrix. This regularization is based on the assumption that even if the subjects, rounds, or sessions are different, the spatial bases constituting brain activity patterns are not very different, and thus the dictionary will not be extensively transformed.

A stochastic gradient descent algorithm [24] was used to solve this constrained optimization problem equation (2). See Morioka et al [19] for more details.

2.3. Data acquisition

Experiments were conducted on 14 male and six female subjects between 18 and 25 years of age (mean: 22.0, SD: 2.2) who had normal or corrected-to-normal vision. All subjects provided written informed consent for their participation in all experimental procedures, which were approved by the Ethics Committee of the Graduate School of Informatics, Kyoto University.

Figure 1 depicts our experimental timeline. To evaluate how personal features appeared given signal variability, we conducted four rounds of measurements per subject: once in the morning and once in the afternoon, daily for 2 d. The interval between day 1 and 2 was different for each subject (range: 1–42 d). Each round consisted of six task sessions and two resting sessions, one of which preceded the task sessions while the other followed the task sessions. There were breaks between sessions. During the breaks, the subjects continued to wear the EEG caps harboring the electrodes. At the onset of each round, we reinstalled the EEG cap. We also assumed that there was no displacement of the electrodes within a single round.

During the resting sessions, resting-state EEG activity was measured for 5–10 min. During this time, subjects were asked to stay awake, relaxed, and still, and fixate with open eyes on a white cross centered on a black screen but not concentrate on any one specific thought.

All subjects were seated in a chair 1 m away from a 23.6 inch display for all visual presentations. Their head position was fixed with chin and forehead supports. A keyboard for registering subject responses was placed in front of the subject. A fixation cross was displayed at the center of the display screen and flashing $1.3^\circ$ visual stimuli bars were presented $8^\circ$ to the left or right of the display's center. Subjects were instructed to maintain a steady gaze on the fixation cross and refrain from blinking as much as possible during both the Preparation and Attention epochs.

Using a Biosemi ActiveTwo system, EEG recordings were taken at a sampling rate of 256 Hz with a 64-electrode cap. The electrodes covered the whole head and were positioned based on the International 10–20 System. We also obtained electrooculography (EOG) data using an additional four electrodes to investigate the influence of eye movements on the EEG signals and their personal features.

2.4. Preprocessing

Raw EEG data were passed through a band-pass filter (0.5–40 Hz) and re-referenced to a common average reference. A Butterworth IIR filter was used as the band-pass filter. Next, we performed a time-frequency analysis using Morlet wavelets to extract the spectral power across seven frequency bands. We set the cycle number to 5.83 and spaced the center frequencies evenly along a logarithmic scale from 2² to 2⁵ with a step of 0.5, resulting in the following steps:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle [2^2,2^{2.5},2^3,2^{3.5},2^4,2^{4.5},2^5] \approx [4,5.7,8,11.3,16,22.6,32]~({\rm Hz}). \nonumber \end{align*}$

We chose these frequency bands because the selective spatial attention task is known to modulate a wide range of frequency bands [26]. For time-frequency analyses, the data were down-sampled to 32 Hz. Furthermore, after wavelet analyses, data were normalized for each time point and frequency to be L2 norm units with a zero average, leading to $x_{irjt} \in {{\mathbb R}}^M$ as shown in figure 2 for each frequency band. The dimensionality M of x_ijt corresponded to the number of channels (64).

**Figure 2.** The proposed method for feature extraction. We used task session data for the training of personal classifiers and resting session data for testing.
Download figure:
Standard image High-resolution image

Because the measurement time in resting sessions varied, 5–10 min, we preprocessed the data for each resting session by trimming the first five minutes from the original data. Data collected from the Rest and Preparation epochs of the task sessions were rejected, and only data acquired during the Attention epochs were used during preprocessing for task session analyses. We excluded behaviorally poor trials from the subsequent analyses using the same criterion as Morioka et al [25], who rejected trials that were not significant, using a significance level of 0.05, in indicating the null hypothesis that the rate of correct responses was equal to the chance level.

2.5. Personal feature extraction

Considering the practicality of our EEG-based personal identification system, we extracted personal features from task sessions, which is a registration onto the authentication system, and tested the corresponding features collected in resting sessions, which is an identification by the authentication system.

Figure 2 depicts the proposed method of personal feature extraction. The subscripts $i,r,j,t$ are indexes corresponding to subjects, rounds, sessions, and time points. In the method proposed here, we applied common dictionary learning to the preprocessed brain activity signals $x_{irjt} \in {{\mathbb R}}^N$ to extract spatial transformations $Z_{irj} \in {{\mathbb R}}^{N \times N}$ as personal features. Here, a common dictionary D was estimated simultaneously with spatial transformations of the training data. In the test phase, however, we calculated only the spatial transformations of the test data with the fixed dictionary D. We set the following parameters based on some initial attempts. The dictionary size K = 128 was twice the number of channels to maintain a variety of dictionary atoms while reserving computational efficiency. The sparseness constant L = 20 was about one third of the number of channels, reducing the number of active elements to allow for sparse solutions while maintaining sufficient information for personal identification. A regularization constant of $\lambda = 10^{-7}$ was determined to balance the expressiveness of personal features with a stability of Z_irj. Other parameters for the optimization procedure were determined based on the convergent profile of the objective function, equation (2), in the stochastic gradient descent algorithm, so that they included an initial learning rate of $\rho = 5$ , a mini-batch size of 512, $R = 2 \times 10^5$ iterations for training, and $R = 2 \times 10^3$ test iterations.

To visualize the influence of the variability, we adopted low-dimensional feature extraction using t-distributed stochastic neighbor embedding (t-SNE) [20, 27]. t-SNE is an algorithm for dimensionality reduction that is well-suited to visualizing high-dimensional data and embeds high-dimensional data points in low-dimensional space so that similarity between data points is adequately reflected. The higher the degree of similarity between data points in high-dimensional space, the closer they are embedded in low-dimensional space. After common dictionary learning, we embedded spatial transformations $Z_{irj} \in {{\mathbb R}}^{N \times N}$ via t-SNE into low-dimensional feature space ( $d = 2 {\rm ~or~} 3$ ) and obtained the feature vectors $v_{irj} \in {{\mathbb R}}^d$ . We further used Euclidian distance as a measure of dissimilarity and set an exaggeration parameter of four so that the feature vectors expressed the full diversity of spatial transformations with a moderate number of clusters. A large exaggeration makes t-SNE learn larger joint probabilities of embedded points and places larger spaces between different clusters of embedded points.

3. Results

To evaluate the influence of the variability stemming from different rounds and the performance of the proposed unsupervised feature extraction, we examined the performance in a couple of problems. The first problem shown in section 3.1 is personal identification, in which we predicted personal labels from EEG signals. Section 3.2 shows the second problem, unknown detection, in which we detected EEG signals that were recorded from a person who had not been registered in the training dataset. Moreover, for each problem, we used two experimental settings to evaluate the influence of the variability caused by different rounds; Setting SR: the test samples were recorded in the same round as the training samples, and Setting DR: the test samples were recorded in different rounds from those for the training samples. In either setting, one out of the four rounds was selected as the training round so that the samples recorded in the six task sessions in this training round were used for training. In setting SR, the samples recorded in the six task sessions in the training round were used for training while the samples recorded in the two resting sessions of the same round were used for testing. In setting DR, the samples recorded in the six task sessions in the training round were used for training while the samples recorded in the two resting sessions of the remaining three rounds were used for testing. When evaluating the performance in either setting, we took an average over four runs by changing the training round, because there were four choices for the training round.

3.1. Personal identification

We used a twenty-class classification of the personal identification problem. We employed an error correcting output code (ECOC), which is a method for solving multi-class classification problems using multiple binary classifiers [28]; in particular, we used one-versus-the-rest encoding. We adopted a support vector machine (SVM) with Gaussian kernel for constituent binary classifiers in the ECOC. When decoding from the set of twenty binary classifiers, we simply chose the classifier with the maximum discriminant function value and regarded the positive label of the selected classifier as the predicted subject's identifier. First, the accuracy for each choice of training round was simply calculated as the ratio of successfully identified data among 40 (=20 (subjects) $\times 1$ (rounds) $\times 2$ (sessions)) in setting SR, and among 120 (=20 (subjects) $\times 3$ (rounds) $\times 2$ (sessions)) in setting DR. Then, results were averaged across four assessments of multiple training rounds.

Figure 3 depicts the results of our multi-subject classification. When selecting training and test data from the same round (SR), we succeeded in achieving a high accurate classification of all frequency bands, in particular, the accuracy was over 99% where d = 2. These results are consistent with those reported in our preliminary study [29]. On the other hand, when using training and test data from different rounds (DR), our classification accuracy decreased. This accuracy depended on the used frequency band and was higher than 40% at 11.3 Hz, which included the $\alpha$ band (8–12 Hz) and higher frequency bands (figure 3 (right)). This 40% accuracy was much higher than the chance level, which is 5% due to the balanced twenty-class classification problem. Subject-wise identification accuracy was significantly higher than the chance level (p < 10⁻⁶, one-sided t-test, n = 20), although a couple of subjects showed low accuracies of $0\%$ (subject G) and $12.5\%$ (subject P). The extremely low accuracy of subject G was due to the scattered distribution of the transformation matrices (see section 4.1 and figure 5 for detail). These results suggest that the EEG signal features obtained by our unsupervised learning methods were indeed disturbed by variability depending on the measurement rounds, but still included personal features that could be used for personal identification. In addition, the classification accuracy was higher in the case of d = 3 than in the case of d = 2, suggesting that the effective dimensionality of the person-specific transformations Z = [Z_irj] was as small as three or two, but the embedding to the minimal dimensional space (d = 2) was accompanied by some information loss.

**Figure 3.** The results of multi-subject classification. The left panel shows the results when using training and test data from the same round (Setting SR) and the right panel shows the results when using data from different rounds (Setting DR). We evaluated the classification accuracy of data from seven frequency bands separated by time-frequency analysis in two cases of embedding dimensionality of t-SNE: $d = 2 {\rm ~and~} 3$ .
Download figure:
Standard image High-resolution image

3.2. Unknown detection

In the unknown detection problem, we sought to determine the effectiveness of EEG usage for decoding known/unknown personal labels. Here, 'unknown' means the corresponding person's EEG data was not included in the training dataset. In the personal identification problem, the frequency band, 11.3 Hz, 22.6 Hz, and 32.0 Hz performed good accuracy. Among them we assessed 11.3 Hz in the unknown detection problem and later analysis because the previous study suggested that spatial distribution of $\alpha$ band power reflects personal features [11]. We set the embedding dimension of t-SNE as its minimal value d = 2. As shown in section 3.1, we compared the performance of Setting SR and Setting DR in this unknown detection. Note that when the subject to be predicted is unknown, the two settings are practically the same; however, they are different when the subject is known. Indeed, the training dataset includes the known subject's personal feature that has been obtained from the same round in SR, but from another round in DR.

One-class classification is an algorithmic approach used for outlier detection. Unlike normal classification problems, one-class classification trains a classifier to discriminate whether each data point belongs to a certain class or not. We designed a one-class classifier that attempted to discriminate between known and unknown labels, based on the training data set from all known subjects. When testing a subject, we defined a decision boundary for the classification score and determined that the subject was known (positive) or unknown (negative) if it scored above or below the decision boundary, respectively. We used one-class SVM [30] with a Gaussian kernel as the one-class classifier. In the validation, we employed the leave-one-subject-out (LOSO) paradigm. That is, when training the one-class classifier, we used data from six task sessions in the training round for 19 subjects. When testing, we predicted the known/unknown labels of all the 20 subjects based on their resting session data. This means that 19 subjects were indeed known because the one-class classifier had been trained by their task session data, but the remaining one subject was unknown, because that person's data had not been used. By changing the excluded subject one by one, we constructed 20 classifiers, and the performance was examined by averaging their results. Given that this known/unknown identification was largely dependent on a given decision boundary, the area under the receiver operating characteristic (ROC) curve (AUC) was adopted as an evaluation index. The ROC curve is a curve drawn by changing the threshold for the decision boundary, given a false positive rate on the horizontal axis and a true positive rate on the vertical axis. The more the AUC surpasses 0.5, the more effective the identification method is. The AUC for each pair of training round and unknown subject was first calculated. Then, similar to personal identification, results were averaged across four assessments of multiple training rounds.

To sufficiently express the complexity of data from the 20 subjects and across the four rounds, the $\nu$ parameter, which adjusts the number of support vectors for one-class classification, was set to 1. The kernel scale was further set to 0.11 in Setting SR and 0.16 in Setting DR. When determining these parameter values, we first performed nested cross-validation for each of the twenty LOSO processes mentioned above, and then obtained the average of the twenty optimized values; that is, we used the fixed value in both SR and DR, regardless of the unknown subject.

Figure 4 shows the results of the unknown detection. In the left panel, we see that the AUC was high for many unknown subjects. In the right panel, however, we see that there are cases wherein an effective classifier was learned and cases wherein it was not. This is due to the variability in the extracted personal features obtained in different rounds, as previously discussed. However, there were some effective classifiers, implying that the performance of the unknown detection can be further elevated by improving the methods of feature extraction and classification.

4. Discussions

The results of sections 3.1 and 3.2 suggest that brain activity (EEG) signals include personal features, which are consistent throughout different times of the day, even after reinstalling the EEG caps, and throughout different days, even with possible changes in the physiological states of the subjects. However, there is some difficulty in extracting features from variable data, which differ from subject to subject; therefore, causing difficulty in the unknown detection problem, especially in Setting DR. This can be visually confirmed by the illustration of personal EEG signal features in the present study.

4.1. Variability specific to subjects and rounds

Figure 5 illustrates the data (here, session-wise transformation matrices) embedded in the two-dimensional feature space determined by t-SNE. In this figure, each alphabet represents a single subject, with the center coordinates for each character corresponding to the data point of a single session. It can be seen that the data from each subject constitute four clusters. This indicates minor variations between different sessions, indicating the validity of the third assumption A3), which stated that 'for data from the same subject and the same round, spatial transformations were consistent across sessions'. Here, the number of clusters, four, corresponds to the number of rounds. Furthermore, we see that subject-wise inter-cluster distance for most of the subjects was smaller than the inter-subject distance. That is, the intra-subject variability, mostly due to the variations in rounds, was smaller than the inter-subject variability. On the other hand, in some subjects, such as 'D', 'G', and 'O', their four clusters were distinctly apart from one another, likely because their personal features were greatly affected by inter-round variability. Especially, the four clusters of subject G were distantly located from one another, which had caused the low personal identification accuracy of this subject in Setting DR. We found no relation between this intra-subject and inter-round variability and the demographic features of the subject, like sex and age. Moreover, there was no significant effect of the date interval between the two measurement days.

Figure 6 depicts a couple of examples of trained one-class classifiers. Each point corresponds to data from a single session. Classifier training was performed using data from twenty rounds where in each round 19 known subjects and one unknown subject were included. The black, green, and red points represent training data points (i.e. task sessions), test data points (i.e. resting sessions) from known subjects, and test data points from the sole unknown subject, respectively. Since one-class classifiers in both figures were derived from the same training data, the contour shapes are the same. We see that a multimodal distribution, reflecting the clustered distribution for each subject, was learned as a discriminative model for the known subject data. The upper panel (SR) depicts data points from known subjects with large classification scores, while those from unknown subjects have small classification scores. Given this, it is obvious that clearer decision boundaries can be derived from the classification scores. On the other hand, in the lower panel (DR), we see that data points from many known subjects have classification scores smaller than those of the unknown subject, rendering it difficult to create clear decision boundaries. The results in figure 4 reflect the difficulty in setting an appropriate threshold for determining decision boundaries.

4.2. Variability from measurement facilities

Variability due to differences in the measurement facilities and environment is likely to have a significant impact on the variability of EEG signals. As explained in section 2.2, the dictionary D is common, regardless of subjects, rounds, and sessions, in the common dictionary learning, and ideally does not include elements depending on subjects. We tested if our proposed method works for the personal identification problem even when the common dictionary D is derived from a dataset with different subjects.

We attempted to transfer the dictionary estimated based on the Advanced Telecommunications Research Institute International (ATR) dataset [25] to the personal identification problem for the subjects registered in our Kyoto University (KU) dataset. More concretely, the common dictionary was estimated by applying the method described in section 2.2 to the ATR dataset. Using the estimated dictionary, we then constructed the personal identification system, as described in section 3.1, based on the KU dataset. It should be noted that no KU data had been used to construct the dictionary. The same ActiveTwo system (BIOSEMI) was used for measurements of both the ATR and KU datasets, though the brain activity signals from each were expected to differ due to differences in the measurement room and other factors. The experimental settings for the datasets are summarized in table 1. One difference was in the EEG caps; the cap used to obtained the KU dataset was developed by BIOSEMI, while that used to obtain the ATR dataset was our original one that simultaneously measured EEG and functional NIRS; however, the EEG electrode allocations followed the International 10-20 System for both datasets. Moreover, there was no overlap in the subjects between the two datasets.

Table 1. Experimental settings for the ATR and KU datasets.

	ATR	KU
Subjects	General call	Students
Sex	40 males	14 males and 6 females
Dominant hand	Right handed	No restriction
Age	20–40 (mean: 24.6, SD: 6.4)	18–25 (mean: 22.0 SD: 2.2)
Vision	Normal or corrected-to-normal	Normal or corrected-to-normal
Places/facilities	ATR (Kyoto)	KU (Kyoto)
Instruments	ActiveTwo system (BIOSEMI)	ActiveTwo system (BIOSEMI)
Rooms	Unshileded	Unshielded
Brightness	Dark	Bright
Tasks	Spatial attention task/resting-state	Spatial attention task/resting-state
Rounds	Once	4 times (morning and afternoon for 2 d)
Measurements	EEG, EOG, NIRS	EEG, EOG

Figure 7 is a bar graph comparing our classification results across different dictionaries when training and test data were selected from the same round (Setting SR) in the KU dataset. Here, we set the embedding dimension by t-SNE as d = 3. There was no significant difference in the classification accuracy between bars (a) and (b), above, indicating that a given dictionary could be used consistently between the two datasets. In addition, given that there was no appreciable difference in the classification accuracy between bars (b) and (c), we can see that the dictionary exhibited consistently high expression ability regardless of the number of data points used for training. However, our results did not necessarily indicate that variability derived from the measurement facility was small enough to be ignored.

Figure 8 illustrates the personal features of the ATR and KU datasets, which were calculated using a common dictionary that used data from all the 40 subjects (i.e. corresponding to the bars (c)). As shown in the figure, personal features in the two datasets, given differences in the relevant measurement facilities, were clearly separated into two groups. This indicated that the variability derived from the difference in measurement facility fairly influenced the personal EEG features determined here. Since the usage of a common dictionary derived from the ATR dataset did not introduce additional intra-subject variability, it was suitable for personal identification. However, given that the subjects included in each dataset were different, future studies must analyze personal EEG features using data from multiple facilities for common subjects to evaluate the influence of variability in measurement facilities in greater detail and to extract more robust personal features from this variable data.

4.3. Variability from electrooculography (EOG)

EEG data in the present study were affected by the measurement environment and potential physiological factors, such as eye movement, myoelectricity, body movement, sweat, and heart beat changes. Since we measured EOG in the vertical and horizontal directions using four electrodes, simultaneously with EEG in this study, here we discuss the effects of eye movement on EEG-based personal identification.

Figure 9 depicts the comparison between personal identification results with and without EOG removal. Bars (a) corresponded to the results shown in figure 3. Bars (c) in the right panel suggest that personal identification performance using only EOG data was extremely poor, with an accuracy of around 0.1–0.2. However, the accuracy of EOG at high frequencies was slightly high, suggesting that EOG at high frequencies contains more personal feature components than that at low frequencies. Considering the bars marked (a) and (b) in the right panel, we see that the accuracy of bars (b) was higher than that of bars (a) in the low frequency bands (4.0–8.0 Hz), but that the accuracy of bars (a) was higher than that of bars (b) in the high frequency bands (11.3–32.0 Hz). This suggests that incorporating EOG signals into the EEG-based personal identification system elevates the identification accuracy in high frequency bands but not in low frequency bands. In addition, the difference in the accuracy between bars (a) and (b) was smaller than the accuracy of bars (b) themselves, suggesting that more personal features required for this identification were included in EEG than in EOG.

4.4. Variability in resting-state brain activity caused by the task

The variability in resting-state brain activity is beyond that induced by measurement-environment factors alone. Resting-state brain activity is also known to change in response to an event that occurs immediately before its detection [31]. As demonstrated in figure 1, resting sessions were held at the beginning (Rest1) and the end (Rest2) of each round, with six task sessions (Task) held in between. To evaluate how resting-state brain activity changed throughout the task, we compared the Frobenius distances of Rest1-Task and Rest2-Task in a single round in a single subject with spatial transformations as the feature of each session's data. The distance to each of the six task sessions was calculated in each round for each subject; that is, the number of data points was 20 (subjects) $\times~4$ (rounds) $\times~6$ (task sessions) =480, both for Rest1-Task and for Rest2-Task.

Figure 10 depicts the distribution of distances for Rest1-Task and Rest2-Task data in a box plot. It also depicts the result of a two-sample t-test for which the null hypothesis was that the data from Rest1-Task and Rest2-Task are derived from independent random samples with distributions with equal means and equal but unknown variance. The p value was $p = 1.9\times10^{-6}$ ; therefore, the null hypothesis was rejected at a significance level of 0.1%. This result indicates that Rest2 is more significantly similar to Task than to Rest1, and confirms variability in the resting-state brain activity caused by performing the task. On the other hand, as we see in figure 5, data from the same round belong to the same cluster on the feature space, indicating that the influence of variability in resting-state brain activity on personal identification was minor.

4.5. Accuracy change by the number of subjects

Although the personal identification accuracy in Setting DR was substantially lower that in Setting SR, it was still significantly higher than the chance level, suggesting our unsupervised learning methods could extract personal signatures even from fragile EEG signals. Figure 11 shows the accuracy of personal identification in Setting DR when changing the number of subjects registered in the authentication system. Obviously, the chance level (dashed line) decreases as the number of subjects increases. The EEG-based personal identification accuracy was higher than the chance level for every number of registered subjects. When the number of subjects increased from 2 to 10, the personal identification performance degraded due to the increasing overlaps of data point distributions between different subjects. Interestingly, however, further increase in the number of subjects elevated the performance, probably because the increase in the amount of data points enhanced the stability of the supervised multi-class classifier. We may speculate that using more data samples, either for registering each subject (i.e. the number of sessions) or for registering more subjects (i.e. the number of subjects), would work for increasing the robustness of the classifier.

**Figure 11.** Personal identification accuracy (solid line) in Setting DR when changing the number of subjects registered in the authentication system. Since we are not interested in the stability in the unsupervised feature extraction, we fixed the common dictionary as that obtained using all the twenty subjects. Note, however, that the task session data for constructing the classifier and the resting session data for testing were disjoint. When evaluating the identification accuracy, we tested all combinations of N subjects out of 20 and averaged the accuracy over the combinations. N denotes the number of subjects on the horizontal axis.
Download figure:
Standard image High-resolution image

4.6. Task versus rest session for registration phase

In this study, we assumed that the personal authentication system in which the registration uses task session data and the identification is performed by resting session data, is practically reasonable. Figure 12 shows the personal identification accuracy for different combinations of usage of training and testing samples in Setting DR. Because the visual attention task makes the mental state of the subjects well controlled, usage of task session data for both of the registration and identification ('Task x Task') naturally exhibited higher accuracy than the other combinations. In contrast, the accuracy when using resting session data for both of the registration and identification ('Rest x Rest') was poor. Since mental situations of the subjects were not controlled in the resting sessions, the personal signatures involved in EEG signals during those sessions were not enough stable to be registered in our authentication system. Interestingly, the personal identification accuracy when using task session data for registration and resting session data for identification ('Task x Rest') was almost similar to that for the combination 'Task x Task'. These results supported our assumption above.

**Figure 12.** The personal identification accuracy for different combinations of usage of training and testing samples in Setting DR. The errorbar shows standard deviation over subjects. Task x Task: EEG features (obtained by our unsupervised learning method) during task sessions were used for registration (training the multi-class classifier) and those during task sessions in different rounds were used for identification (testing). Rest x Rest: registration and identification were performed for EEG features both from resting sessions in different rounds. Task x Rest 2: registration and identification were performed for EEG features from task sessions and resting sessions after the task in different rounds. Task x Rest 1: registration and identification were performed for EEG features from task sessions and resting sessions before the task in different rounds. A t-test shows that the accuracy for 'Rest x Rest' is significantly lower (p < 0.01) than those for the other three combinations.
Download figure:
Standard image High-resolution image

When comparing the test of the features extracted from the resting session before the task ('Task x Rest1') and the test of those after the task ('Task x Rest2'), the accuracy was almost the same. Although the extracted features between the resting sessions before and after the task were different as shown in figure 10, this difference was minor for our personal identification.

5. Conclusion

In the present study, we evaluated how EEG signal variability, derived from the measurement-environment factors and subject-dependent factors, influenced the performance of personal identification. To do this, we measured EEG signals over four rounds (morning and afternoon daily for 2 d) per subject (n = 20). We applied an unsupervised learning approach, using the common dictionary learning and t-SNE, to the EEG signal data to extract personal features from it against the variability.

To evaluate the influence of variability on these data, we compared the identification accuracy in two experimental settings. In the first setting, Setting SR, the training and test data were from the same round. In the second setting, Setting DR, the data were from different rounds. The result of our personal identification problem suggested that EEG signals included some robust personal features despite any variability that we incurred. In addition, the results of our unknown detection study showed individual differences in its performance. It suggested that the possible unknown detection would require further improvements in feature extraction and identification methods. One possible idea would be to develop a data-driven encoding model for the ECOC multi-class classification [32].

Various types of variability may influence brain activity (EEG) signals. Variability may be derived from measurement facilities, which had a minor effect on personal identification, but substantially influenced the identification of the measured facility sources. To extract robust personal features from variable data, we suggest that measurement data from common subjects obtained in multiple facilities be analyzed in the future. Furthermore, we found that EOG might have improved EEG-based personal identification accuracy, especially in the high frequency bands, although the sole usage of EEG could also have achieved enough personal identification. Additionally, variability in resting-state brain activity could have been caused by preceding tasks, although we also found that the influence of variability in resting-state brain activities on personal identification was minor.

EEG-based individual identification technology has significant implications for biometric-based authentication. The method may be applicable, for instance, in the use of head-mounted devices or high security transactions such as large money transfers. In addition, this technology may also be applied to medical diagnosis. Changes in personal signal characteristics may predict future or ongoing mental abnormalities, and hence can provide information for, for example, aiding treatments and setting diagnosis strategies.

The present study is not only just relevant to the practical use of EEG-based personal identification, but also for the evaluation and assessment of various sources of brain activity signal variability. In the future, research on individual differences in brain activity may provide insights into the neuroscience of human personality, for instance, as well as other emerging scientific domains.

Acknowledgments

This study was supported by Post-K Project from Ministry of Education, Sports, Science and Technology (MEXT) Grant No. hp170218, and JSPS KAKENHI Grant Nos. JP19H04180 and 17H06310.

EEG-based personal identification method using unsupervised feature extraction and its robustness against intra-subject variability

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction