In-ear EEG biometrics for feasible and readily collectable real-world person authentication

The use of EEG as a biometrics modality has been investigated for about a decade, however its feasibility in real-world applications is not yet conclusively established, mainly due to the issues with collectability and reproducibility. To this end, we propose a readily deployable EEG biometrics system based on a `one-fits-all' viscoelastic generic in-ear EEG sensor (collectability), which does not require skilled assistance or cumbersome preparation. Unlike most existing studies, we consider data recorded over multiple recording days and for multiple subjects (reproducibility) while, for rigour, the training and test segments are not taken from the same recording days. A robust approach is considered based on the resting state with eyes closed paradigm, the use of both parametric (autoregressive model) and non-parametric (spectral) features, and supported by simple and fast cosine distance, linear discriminant analysis and support vector machine classifiers. Both the verification and identification forensics scenarios are considered and the achieved results are on par with the studies based on impractical on-scalp recordings. Comprehensive analysis over a number of subjects, setups, and analysis features demonstrates the feasibility of the proposed ear-EEG biometrics, and its potential in resolving the critical collectability, robustness, and reproducibility issues associated with current EEG biometrics.

In-Ear EEG Biometrics for Feasible and Readily I. INTRODUCTION P ERSON authentication refers to the process of confirming the claimed identity of an individual, and is already present in many aspects of life, such as electronic banking and border control. The existing authentication strategies can be categorised into: 1) knowledge-based (password, PIN), 2) token-based (passport, card), 3) biometric (fingerprints, iris) [1]. Most extensively used recognition methods are based on knowledge and tokens, however, these are also most vulnerable to fraud, such as theft and forgery, and can be straightforwardly used by imposters. In contrast, biometric recognition methods rest upon unique physiological or behavioural characteristics of a person, which then serve as 'biomarkers' of Manuscript  an individual, and thus largely overcome the above vulnerabilities. However, at present, biometric authentication systems are cumbersome to administer and require considerable computational and man-power overloads, such as special recording devices and the corresponding classification software.
With the current issues in global security, we have witnessed a rapid growth in biometrics applications based on various modalities, which include palm patterns with high spectral wave [2], patterns of eye movement [3], patterns in the electrocardiogram (ECG) [4], and otoacoustic emissions [5]. Each such biometric modality has its strengths and weaknesses, and typically suits only a chosen type of application and its corresponding scenarios [6]. A robust biometric system in the real-world should satisfy the following requirements [1]: • Universality: each person should possess the given biometric characteristic, • Uniqueness: not any two people should share the given characteristic, • Permanence: the biometric characteristic should neither change with time nor be alterable, • Collectability: the characteristic should be readily measurable by a sensor and readily quantifiable. In addition, a practical biometric system must be harmless to the users, and should maximise the trade-off between performance, acceptability, and circumvention; in other words, it should be designed with the accuracy, speed, and resource requirements in mind [6].
One of the currently investigated biometric modalities is the electroencephalogram (EEG), an electrical potential between specific locations on the scalp which arises as a result of the electrical field generated by assemblies of cortical neurons, and reflects brain activity of an individual, such as intent [7]. From a biometrics perspective, the EEG fulfils the above requirements of universality, as it can be recorded from anyone, together with the uniqueness. Specifically, the individual differences of EEG alpha rhythms has been examined [8] and reported to exhibit a significant power in discriminating individuals [9] in the area of clinical neurophysiology. The brain activity is neither exposed to surroundings nor possible to be captured at a distance, therefore the brain patterns of an individual are robust to forgery, unlike face, iris, and fingerprints. The EEG is therefore more robust against imposters' attacks than other biometrics and among different technologies to monitor brain function. However, in order to utilise EEG signals in the real-world, several key properties such as permanence and collectability must be further addressed.
The 'proof-of-concept' for EEG biometrics, was introduced in our own previous works in [10] and [11], and most of the This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ follow-up studies were conducted over only one day (or even over one single trial) recording with EEG channels covering the entire head, while in the classification stage, the training and validation datasets were randomly selected from the same recording day (or the same trial). Apart from its usefulness as a proof-of-concept, this setup does not satisfy the feasibility requirement for a real-world biometric application, since: • Recording scalp-EEG with multiple electrodes is timeconsuming to set-up and cumbersome to wear. Such a sensor therefore does not meet collectablity requirement. • EEG recordings from one day (or a single trial) cannot truly evaluate the performance in identifying features of an individual, as this scenario does not satisfy the permanence requirement either, see details in Section II-B. • The training and validation data within this scenario are inevitably mixed, thereby introducing a performance bias in classification. The classification results from such studies are therefore unrealistically high, and we shall refer to this setting as the biased scenario. Therefore, for feasible EEG biometrics, the EEG sensor should be wearable, easy to administer, and fast to set-up, while in order to evaluate the performance, the recorded signal should be split in a rigorous way -the training and validation datasets in the classification stage should be created so as not to share the same recording days, a setting we refer to as the rigorous scenario. While a considerable body of research has been undertaken to explore the EEG biometrics and to find the most informative subject-specific characteristic of EEG (uniqueness), most studies either focused on reducing the number of electrodes (collectability) or on evaluating whether the traits are temporally robust (permanence) by using EEG data obtained over multiple recording days; for more details see Section II-C.
In this paper, based on our works in [11] and [12], we bring EEG-based biometrics into the real-world by resolving the following critical issues: 1) Collectability. Biometrics verification is evaluated with a wearable and easy to set-up in-ear sensor, the so-called ear-EEG [12], 2) Uniqueness and permanence. These issues are addressed through subject-dependent EEG features which are recorded over temporally distinct recording days, 3) Reproducibility. The recorded data are split into the training and validation data in two different setups, biased and rigorous setup, 4) Fast response. The classification is performed by both a fast non-parametric (cosine distance) and standard parametric approaches (linear discriminant analysis and support vector machine). Through these distinctive features, we successfully introduce a proof-of-concept for a wearable in-ear EEG biometrics in the community.

A. Biometric Systems With Verification/Identification
Depending on the context, the two categories of biometric systems are: 1) verification systems and 2) identification Person recognition systems. Top: Verification system. Bottom: Identification system. systems, summarised in Figure 1 [6]. Verification refers to validating a person's identity based on their individual characteristics, which are stored/registered on a server. In technical terms, this type of a biometric system performs a one-toone matching between the 'claimed' and 'registered' data, in order to determine whether the claim is true. In other words, the question asked for this application is 'Is this person A?' as illustrated in Figure 1 (top panel). In contrast, an identification system confirms the identify of an individual from crosspattern matching of the all available information, that is, based on one-to-many template matching. The underlining question for this application is 'Who is this person?' as illustrated in Figure 1 (bottom panel).

B. Feasible EEG Biometrics Design
Traditionally, EEG-based biometrics research has been undertaken based on both publicly available datasets [11] and custom recordings as part of research efforts [13]. However, most of existing studies failed to rigorously address the key criterion, collectability, which is also related to repeatability. A large number of studies, especially those conducted at the dawn of EEG biometrics research, employed classification of the clients based on supervised learning with the training and validation data coming from the same recording trial. However, this experimental setup cannot truly evaluate the performance in identifying individual features, since such classification does not take into account the varying characteristic among multiple recording trials and recording days. In addition, EEG is prone to contamination by artefacts from subjects' movements (e.g. eye blinks, chewing), while the sources of external noise include electrode noise, power line noise, and electromagnetic interference from the surroundings. This opens the possibility to additionally incorrectly associate 'EEG patterns' with either trial-dependent features or so-called noise-related features -in other words, this setup is biased in favour of high classification rate. Therefore, given the notorious variability of EEG patterns across days, biometrics studies based on a single recording day (even for a single subject) can only validate very limited scenarios, without any notion of repeatability and long-term feasibility [13].  Figure 2 shows the concept of a rigorous EEG biometrics verification/identification system in the real-world. Individuals participate in EEG recordings and their EEG signals are registered and stored on a server or in a database (left panel). The client is granted access to their account by providing new EEG data in verification scenarios, whereby the algorithm discriminates the identify of an individual is in identification scenarios through new EEG recordings. Recall that the registered EEG must be recorded beforehand.
In order to fulfil the feasibility requirement, several studies performed successful EEG-based biometrics from multiple recording trials conducted on multiple distinct days, thus satisfying the collectability requirement. However, the majority of these studies were still conducted in an unrealistic scenario, whereby the training and validation data in the classification process are split into segments, with all the segments coming from multiple trials but on the same recording day being randomly assigned to the training and validation datasets. Therefore, this biased setup, despite being based on the classification from multiple recording trials, mixes the training and test recordings from the same recording day and thus cannot truly evaluate the performance in the identification of individual features.
In order to truly validate the robustness of a EEG biometrics application, within a rigorous setup, it is therefore necessary to both: i) conduct multiple recordings over multiple days, and ii) to assign recordings on one day as the training data and use the recordings from the other days as the validation data. In other words, the training and validation datasets should be created so as not to share the same recording days (as illustrated later in Figure 6, Setup-R). As emphasised by Rocca et al., the issue of the repeatability of EEG biometrics in different recording sessions is still a critical open issue [14]. Table I summarises the state-of-the-art of the existing EEG biometrics applications based on multiple data acquisition days.

C. Previous Protocols
1) Biased Setup: In the first category (Setup: biased) is the studies where the training and validation features were randomly selected regardless of the data acquisition days. Abdullah et al. [15] collected 4 channels of EEG data from 10 male subjects during the resting state, in the both eyes open (EO) and eyes closed (EC) scenarios, in 5 separate recording days over a course of 2 weeks. In each recording day, 5 trials of 30 s recordings were recorded, and the recorded data were split into 5 s segments with an overlap of 50 %. The autoregressive (AR) coefficients of the order p = 21 were extracted from each segment, and the extracted features were randomly divided into the training (90 %) and validation (10 %) sets, namely 10-fold cross-validation. An artificial neural network (ANN) yielded 97.0 % of correct recognition rate (CRR) for the EC task, and 96.0 % of CRR for the EO task. Riera et al. [16] recorded 2 forehead channels of EEG from 51 subjects over 4 separate recording days. The average temporal distance between the 1st and the 4th recording was 34 ± 74 days. The participants performed the EC task, and the duration of recordings was between 2 and 4 minutes. The recorded EEG was split into 4 s segments, and five types of features were calculated for each segment, namely AR coefficients of order p = 100, power spectral density (PSD) in the 1 − 40 Hz band, mutual information (MuI), coherence (COH), and cross-correlation (CC). The authors trained various classifiers and identified the 5 best classifiers. The Fisher's discriminant analysis (FDA) was then employed and was first trained with different types of discriminant functions using the 1st to 3rd recording trials; then the 4th recording trials were used for testing. Next, the best 5 classifiers from the training process were utilised for authentication tests, using the first and the second minutes of recordings from each trial; therefore, the training data for the classifiers and test data for the validation were not disjoint (biased setup). The discriminant analysis with the selected discriminant function achieved 3.4 % of equal error rate (EER). Su et al. [17] analysed 5 minutes of the EC task from 40 subjects, with 6 recording trials performed in 2 separate recording days for each subject. The recorded EEG data from the FP1 channel were split into segments of multiple lengths. The PSD in the 5 − 32 Hz band and AR coefficients of the order p = 19 were chosen as features. The extracted features were randomly divided into the training (50 %) and validation (50 %) sets. As a result of 100 iterations, the classifier combining Fisher's linear discriminant analysis (LDA) and k-nearest neighbours (KNN) achieved an average CRR = 97.5 %, for a segment length of 180 s.
2) Rigorous Setup: Multiple research groups considered EEG biometrics based on splitting the training and validation data in a rigorous way, so as not to share the data from the same recording days to highlight the feasibility of their system (Setup: rigorous). Marcel and Millan [18] analysed 8 channels of EEG from 9 subjects, with 4 recording trials over 3 consecutive days. The 15 s trials consisted of two different motor imagery (MI) mental tasks, the imagination of hand movements. The recorded data were split into 1 s segments, and PSD in the 8 − 30 Hz band was calculated for each segment. The Gaussian mixture model (GMM) was chosen as a classifier, and maximum a posteriori (MAP) estimation was used for adapting a model for client data. By combining recordings over two days as training data, the authors achieved 19.3 % of half total error rate (HTER), which is a performance criterion widely used in biometrics; for more detail see Section III-G. Lee et al. [19] conducted an experiment of 300 s in duration from four subjects over two days, based on single channel of EEG in the EC scenario. The data were segmented into multiple window sizes, and to extract frequency domain features, PSD was calculated only for the α band (8 − 12 Hz). Even though the dataset size was relatively small, with 50 s of segment length, the LDA achieved 100 % classification accuracy. Rocca et al. [20] recorded two resting state EEGs, in both the eyes open (EO) and eyes closed (EC) scenarios, from 9 subjects over 2 different recording days, which were spanned 1 to 3 weeks. The recording length was 60 s, and the recorded data were split into 1 s segments with an overlap of 50 %. The AR model (Burg algorithm) of the order p = 10 was employed for feature extraction, and the training and validation data were split without mixing the trials from different recording days. The recognition results, with features from selected 3 or 5 channels of scalp EEG, were obtained by linear classification based on minimising the mean square error (MMSE), to achieve CRR = 100 %. Armstrong et al. [21] recorded event related potentials (ERPs) and constructed two datasets. One dataset included EEG recorded from 15 subjects in two separate days, with a 5 -40 inter-day interval, and the other one contained EEG from 9 subjects obtained in three separate days, with the average interval between the first and third recordings of 156 days. The recorded data from the O2 channel were split into 1.1 s long segments, which contained an ERP and started from a 100 ms pre-stimulus. The cross-correlation (CC) between the training and validation data was used as a feature for classification, and CRR = 89.0 % was achieved for validating the 2nd day recordings whereas CRR = 93.0 % for classifying the 3rd day recordings. Maiorana et al. [22] analysed 19 channels of EEG from 50 subjects during both EC and EO tasks in three different recording days, with the average interval between the first and the third recording of 34 days. Each recording trial consisted of 240 s of data, segmented into 5 s windows, with an overlap of 40 %. Three types of features were extracted, including a channel-wise AR model (using the Burg algorithm) of the order p = 12, channel-wise PSD, and the coherence (COH) between the EEG channels. The L1, L2, and cosine distances were calculated for the extracted features, and the rank-1 identification rate (R1IR) achieved 90.8 % accuracy in the EC task and 85.6 % in the EO task.

D. Biometrics Based on Collectable EEG Systems
With a perspective of collectability, a biometrics application with dry EEG electrodes was recently introduced [23]. While conventional wet EEG headsets require the application of a conductive gel which is generally time-consuming, the dry headset with 16 scalp channels took on the average 2 minutes to be operational. The brain-computer interface based biometrics application with rapid serial visual presentation paradigm achieved CRR = 100 % with 27 s window size over all 29 subjects. Although the recordings were performed over a single recording day per subject, the application with a dry headset was a step forward towards establishing collectable EEG biometrics in real-world.
In a recent effort to enable collectable EEG, the in-ear sensing technology [12] was introduced into the research community. The ear-EEG has been proven to provide onpar signal quality, compared to conventional scalp-EEG, in terms of steady state responses [12], [24], monitoring sleep stages [25], [26], and also for monitoring cardiac activity [27], [28]. The advantages of the in-ear EEG sensing for a potential biometrics application in the real-world are: • Unobtrusiveness: The latest 'off-the-shelf' generic viscoelastic EEG sensor is made from affordable/consumable standard earplugs [29], • Robustness: The viscoelastic substrate expands after the insertion, so the electrodes fit firmly inside the ear canal [27], where the position of electrodes remains the same in different recording sessions, • User-friendliness: The sensor can be applied straightforwardly by the user, without the need for a trained person. Therefore, biometrics with ear-EEG offers a high degree of collectability, a critical issue in real-world applications. Previously, even based on this biased scenarios, in-ear EEG based biometrics application has been proposed in [30].

E. Problem Formulation
We investigate the possibility of biometrics verification with a wearable in-ear sensor, which is capable of fulfilling the collectability requirement. The data were recorded over temporally distinct recording days, in order to additionally highlight the uniqueness and permanence aspects. Although the changes in EEG rhythms may well depend on the time period of years rather than days, the alpha band features during the resting state with eyes closed were reported as the most stable EEG feature over two years [31]. Since EEG alpha rhythms predominantly originate during wakeful relaxation with eyes closed, we chose our recording task to be the resting state with eyes closed. This task was used in multiple previous studies [15]- [17], [19], [20], [22]. In order to design a feasible biometrics application in the real-world, we considered imposters in two different ways: i) registered subjects in a database, and ii) subjects not belonging to a database. Previously, Riera et al. [16] also used a single trial of EEG recording from multiple subjects as 'intruders', while the 'imposters' data were EEG recordings available from multiple other experiments. For rigour, we collected two types of data: 1) based on multiple recordings from fifteen subjects over two days, and 2) multiple recordings from five subjects, which were only used for imposters' data. The classification was performed by both a non-parametric and parametric approach. The non-parametric classifier, minimum cosine distance, is a simplest way for evaluating the similarity between the training and validation matrix, whereas the parametric approach, the support vector machine (SVM), was tuned within the training matrix in order to find optimal hyper-parameters and weights for validation. The same hyperparameters and weights were used for classifying the validation matrix. Besides, the linear discriminant analysis (LDA) was also employed as a classifier. Through the binary clientimposter classification, we then evaluated the feasibility of our in-ear EEG biometrics.

A. Data Acquisition
The recordings were conducted at Imperial College London, for two different groups of subjects and under the ethics approval, Joint Research Office at Imperial College London ICREC12_1_1. One set of data were the recordings used as both clients and imposters data, denoted by S R , and the other subset were the recordings for only imposters' data, denoted by S N . Table II summarises the two recording configurations of S R and S N .
For the S R subset of recordings, fifteen healthy male subjects (aged 22-38 years) participated in two temporally separate sessions, with the interval between two recording sessions between 5 and 15 days, depending on the subject. The participants were seated in a comfortable chair during the experiment, and were asked to rest with eyes closed. The length of each recording was 190 s, and the recording was undertaken three times (trials) per one day. The interval between each recording trial was approximately 5 to 10 minutes. In total, six trials were recorded per subject. The in-ear sensor was inserted in the subject's left ear canal after earwax was removed; it then expanded to conform to the shape of the ear canal. The reference gold-cup standard electrodes were attached behind the ipsilateral earlobe and the ground electrodes were placed on the ipsilateral helix. For simplicity, the upper electrode is denoted by Ch1, while Ch2 refers to the bottom electrode, as shown in Figure 3 (left panel). The two EEG signals from flexible electrodes were recorded using the g.tec g.USBamp amplifier with a 24-bit resolution, at a sampling frequency f s = 1200 Hz.
For the S N subset of recordings, five healthy subjects (aged 22-29 years) participated in three recording trials. Similar to the S R subset of recordings, the participants were seated in a comfortable chair, and were resting with eyes closed. The duration of recording was also 190 s. A generic earpiece with two flexible electrodes [29] was inserted in the subject's left ear canal and the same reference and ground configuration was utilised for the S R subset of recordings.
Similar to the setup in [22], there was no restriction on the activities that the subjects performed, and no health test such as their diet and sleep, was carried out neither before or between an EEG acquisition and the following one, nor during the days of the recordings. This lack of restrictions allowed us to acquire data in conditions close to real life.

B. Ear-EEG Sensor
The in-ear EEG sensor is made of a memory-foam substrate and two conductive flexible electrodes, as shown in Figure 3. The substrate material is a viscoelastic foam, therefore the 'one-fits-all' generic earpiece fits any ear regardless of the shape. The size of earpiece was the same for over twenty subjects (both the S R and S N subjects). Further details of the construction of such a viscoelastic earpiece and its detailed recordings of various brain functions can be found in [27] and [29].

C. Pre-Processing
The two channels of the so-obtained ear-EEG were analysed based on the framework illustrated in Figure 4. In each recording, for both the S R and S N recordings, the first 5 s of recording data were removed from the analyses, in order to omit noisy recordings arising at the beginning of the acquisition. The two recorded channels of EEG were bandpass filtered with the fourth-order Butterworth filter with the passband 0.5 − 30 Hz. The bandpass filtered signals were split into segments. The symbol N denotes the number of segments per recording trial from both the S R and S N subsets. The lengths of segments were chosen as L seg = 10, 20, 30, 60, 90 s. Therefore, when the segment length was L seg = 60 s, N = 3 (190 − 5 = 185 s, 185/60 = 3) segments were extracted from every recording trial of the S R and S N . Within each segment, the data was split into epochs of 2 s length. The epochs with the amplitudes of greater than 50 μV for either Ch1 or Ch2 were considered corrupted by artefacts and removed from the analyses. This method resulted in a loss of 4.3 % of the data, namely approximately 7.7 s out of 190 s per recording trial.

D. Feature Extraction
After the pre-processing, two types of features were extracted from each segment of the ear-EEG. For a fair comparison with the state-of-the-art, these features were selected to be the same or similar to those used in the recent studies based on the resting state with eyes closed [19], [20], and included: 1) a frequency domain feature -power spectral density (PSD), and 2) coefficients of an autoregressive (AR) model.
1) The PSD Features: Figure 5 shows power spectral density for the in-ear EEG Ch1 (left) and for the in-ear EEG Ch2 (right) of two subjects. For this analysis, the recorded signals were conditioned with the fourth-order Butterworth filter with the pass-band 0.5 − 30 Hz. The PSD were obtained using Welch's averaged periodogram method [32], the window length was 20 s with 50 % of overlap. The PSDs are overlaid between different recording days (red: Day1, blue: Day2), as well as among different recording trials with the same recording days, especially visible from 3 to 20 Hz. Previously, Maiorana et al. utilised PSD features for EEG biometrics based on the resting state eyes closed and achieved the best performance between the PSD features from theta to beta band, which was classified by the minimum cosine approach [22]; the inclusion of the delta band decreased their identification performance. In our in-ear EEG biometrics approach, the obtained PSDs were visually examined and we found that the ratio between the the total α band (8 − 13 Hz) power and the total θ − α high band (4 − 16 Hz) power is a relatively more significant individual factor for biometrics, rather than the total α band (8 − 13 Hz) power, which is proposed in [19]. Therefore, in each segment of length L seg , univariate PSD was calculated by Welch's method with 2 s of the window length and no overlap. Three features were obtained for each PSD: 1) The ratio between the total α band (8−13 Hz) power and the total θ −α high band (4−16 Hz) power, 2) the maximum power in α band, and 3) the frequency corresponding to the maximum of α band power. In total, D = 6 (three features × two channels) frequency domain features were extracted from each segment.
2) The AR Features: The Burg algorithm [32] of order p = 10 was used to estimate the AR coefficients. For each segment, we applied univariate AR parameter estimation of its α band (8 − 13 Hz) with a window length of 2 s and no overlap. The AR model was chosen as a feature, because it was used in a previous successful study on EEG biometrics based on the resting state with eyes closed [20]. A total of D = 20 features (ten coefficients × two channels) were therefore extracted for each ear-EEG segment.

E. Validation Scenarios
With the extraction of the both univariate AR and PSD features from two channels, the dimension D of features per EEG segment was twenty six. Recall that the first 5 s of recording data were removed from the analyses. For each trial of the S R and S N recordings, the data with the duration of 190 s was split  into segments of length L seg = 10, 20, 30, 60, 90 s. Therefore, N = 18, 9, 6, 3, 2 segments were respectively obtained. Each recording trial was represented by the feature matrix X R for the S R recordings and X N for the S N recordings, such matrices have N × D elements. In this way, a set of six feature matrices was obtained from one subject for the S R recordings (three recording trials per one day, over two different days), whilst a set of three feature matrices was obtained from one subject for the S N recordings (three recording trials per one day, one recording day). We next discuss the use of feature matrices in two different validation scenarios. As emphasised in Introduction, we introduce a feasible EEG biometrics which satisfies the collectability requirements, which are also related to repeatability. Therefore, for rigour, we used all feature matrices X R from the 1st day of recordings as the training data, and feature matrices from the 2nd day of recordings as the validation data, and vice versa (Setup-R). 1 Our goal was to examine the robustness of the proposed approach over the two different time periods in Setup-R. For the second setup, Setup-B, 1 training feature matrices were also selected from the trials which were recorded at the same recording day as the validation matrices. Namely, the training and validation data are split by mixing the data from the same recording days. Notice that, although used in most available EEG biometrics studies [15]- [17], Setup-B could not evaluate the repeatability/reproducibility of the application, because the training and validation data were both from the same recording days. In other words, such an approach benefits from the recording-day-dependent EEG characteristic in the classification. However, as the number of feasible biometric modalities with in-ear EEG sensor is limited and for comparison with other studies, for convenience we also provide the results for Setup-B. Figure 6 summarises the two validation scenarios, Setup-R and Setup-B. For clarity, we denote by V C the validation feature matrix for the client, by V I the validation feature matrix for the imposters, while T C is the training feature matrix for the client, and T I as the training feature matrix for the imposters. The feature matrix from a single trial of the subject i , recording day j , and trial k, from the S R recordings is denoted by X (i, j,k) R . Then, the training feature matrix Y T and the validation matrix Y V are given as Besides, in order to evaluate feasibility in the real-world, we used S N recordings, which are EEG recordings only used for imposters; Riera et al. termed the imposter only data as 'intruders' [16]. For an additional scenario in both Setup-R and Setup-B (see Section IV-D), S N recordings were used as  . Therefore, the validation matrix is given by Table III summarises the properties of matrices for the training matrix and the validation matrix in the both Setup-R and Setup-B.

F. Classification
For both the Setup-R and Setup-B, we selected every trial from every subject for the validation of client data, so as to have validated ninety times (three trials × two days × fifteen subjects). For each validation, both the largest and smallest values were found for each feature (column-wise) from the training matrix, then the validation matrix was normalised to the range [0, 1] based on these largest and smallest values. Three classification algorithms were employed: 1) a nonparametric approach -minimum cosine distance [22], 2, 3) parametric approaches -linear discriminant analysis (LDA) [33] and support vector machine (SVM) [34].
1) Cosine Distance: The cosine distance is the simplest way for evaluating the similarity between the rows of the validation matrix, Y V (l,:) , where l = 1, . . . , 15N for Y V = Y V R and l = 1, . . . , 20N for Y V = Y V R +Y V I N , and the training matrix, Y T , and is given by In other words, the cosine distance is used for evaluating the similarity between a given test sample (e.g lth row of the validation matrix, Y V (l,:) ) and a template (training) feature matrix, Y T . The distances between the lth row of the validation matrix, Y V (l,:) , and the each row of training matrix Y T were first computed, then the minimum among the computed distances was selected.
2) LDA: The binary-class LDA was employed as a classifier. The LDA finds a linear combination of parameters to separate given classes. The LDA projects the data onto a new space, and discriminates between two classes by maximising the between-class variance while minimising the within-class variance.
3) SVM: The binary-class SVM was employed as a parametric classifier [34]. For both Setup-R and Setup-B, four hyper-parameters: type of kernel, regularisation constant for loss function C, inverse of bandwidth γ of kernel function, and order of polynomial d, were tuned by 5-fold cross-validation within the training matrix. Then, the same hyper-parameters were used in order to obtain the optimal weight parameters within the training matrix. The same hyper-parameters and weight parameters as in the training were used for validation. Table IV summarises the hyper-parameters for SVM.

G. Performance Evaluation
Feature extraction and classification with minimum cosine distance and with LDA was performed using Matlab 2016b, and the classification with SVM was conducted in Python 2.7.12 Anaconda 4.2.0 (x86_64) operated on an iMac with 2.8GHz Intel Core i5, 16GB of RAM.
For the verification setup (the number of classes M = 2, client-imposter classification), the performance was evaluated through the false accept rate (FAR), false reject rate (FRR), half total error rate (HTER), accuracy (AC), and true positive rate (TPR), defined as: The parameter TP (true positive) represents the number of positive (target) segments correctly predicted, TN (true negative) is the number of negative (non-target) segments correctly predicted, FP (false positive) is the number of negative segments incorrectly predicted as the positive class, and FN (false negative) is the number of positive segments incorrectly predicted as negative class. For the identification setup (M = 15), the performance was evaluated by subject-wise sensitivity (SE), identification rate (IR) and Cohen's kappa (κ) coefficient as: where N segment is the total number of segments.

IV. RESULTS
The biometric verification results within a one-to-one clientimposter classification problem are next summarised. In terms of the verification, we considered the following scenarios: • Client-imposter verification based on varying segment lengths L seg (Section IV-A), • Verification with various combinations of features (Section IV-B), • Verification across different classifiers, both nonparametric and parametric ones (Section IV-C), • Verification of registered clients and imposters (S R ), and of non-registered-imposters (S N ) (Section IV-D), • Subject-wise verification (Section IV-E). We also considered biometric identification, that is, a one-tomany subject-to-subject classification problem (Section IV-F). Table V summarises the details of the considered scenarios.   Table III, and imposter data are selected from the other 'non-client' subjects, V V I ∈ R 14N×D in Table III. The ratio between V V C and V V I is therefore 1 : 14, and thus the chance level is 14/15. In Setup-R, the results with L seg = 60 s achieved both the best HTER score, 17.2 %, and the best accuracy (AC), 95.7 %. Notice that, the number of TP (=183) is larger than FN + FP (=174) with L seg = 60 s, therefore, the likelihood of making a true positive verification is higher than making a false verification. In Setup-B, the results with L seg = 90 s obtained both the best HTER score, 6.9 %, and the best accuracy (AC), 98.3 %.

B. Client-Imposter Verification With Different Features
Table VII shows the validation results in Setup-R, and over a range of different selections of features, such as AR coefficients, frequency band power, and the combination of AR and band power features for the segment length of L seg = 60 s. The classification results using both AR features and PSD features were the highest in terms of both HTER and AC, which corresponds to Table VI (upper-panel), for L seg = 60 s.  Table VIII shows the imposter-client verification accuracy based on the minimum cosine distance, LDA, and SVM, for both Setup-R and Setup-B, with a segment size of L seg = 60 s. Both the PSD and AR features were used. In Setup-R, the results with cosine distance were the best in terms of both HTER score, 17.2 % and AC, 95.7 %. In Setup-B, the results of both HTER and AC were the best based on the SVM classifier, 5.5 % and 99.0 %, respectively.

D. Validation Including Non-Registered Imposters
Table IX summarises the confusion matrices of both Setup-R and Setup-B with segment sizes L seg = 60 s, classified by the minimum cosine distance, LDA, and SVM; these correspond to Table VIII, panels Setup-R and Setup-B. The confusion matrices were categorised into: Notice that the minimum cosine distance approach assigns the class (client or imposter) of the nearest data from the training matrix. In this study, we selected every trial from every subject for the validation of client data, so as to have validated ninety times; therefore, the nearest data (from dataset S R ) for each imposter data from dataset S N , also always become the 'client' once. Hence, when the nearest data for an S N data become the 'client' data in the training matrix, the imposter data from dataset S N are straightforwardly classified as 'client'. Therefore, regardless of data, the TPR for Imposter matrix Y V I N is 93.3 % for the minimum cosine distance approach; however for comparison among different classifiers, these results are also included.
In Setup-R, the TPR of client Y V C , achieved by the minimum cosine distance was the highest, with respective value of 67.8 %. However, the TPRs obtained by SVM for imposters Y V I and Y V I N were 98.6 % and 96.2 %, respectively, which was higher than those achieved by LDA. In Setup-B, both the TPR of client Y V C and that of imposters Y V I and Y V I N by SVM were the highest, with respective values of 89.3 %, 99.7 % and 96.3 %.

F. Biometrics Identification Scenarios
Table X (right column) summarises the subject-wise identification rate obtained by the minimum cosine distance classifier with the PSD and AR features from L seg = 60 s segments in Setup-R. Previously, we considered a binary client-imposter classification problem (e.g. M = 2) for each subject, each day, and each trial, however, the classification algorithm used in this study was the simple minimum cosine distance between the validation matrix and training matrix. For the prediction of lth row of the validation matrix, Y V (l,:) , the minimum distance between the training matrix, Y T , was found, e.g. the nth row of the training matrix, and the same label was assigned to the nth row of the training matrix as the prediction label for lth row of the validation matrix, Y V (l,:) . Notice that the minimum distance approach is applicable for biometrics identification problems, which is a one-to-many classification. The number of classes was M = 15, which corresponds to the the number of subjects in the S R recordings, therefore the chance level was 1/15 = 6.7 %. The achieved identification rate was 67.8 % with L seg = 60 s segments in Setup-R, while the achieved Cohen's kappa coefficient was κ = 0.65 (Substantial agreement) [35]. Figure 7 shows identification rate of both Setup-R and Setup-B, with different segment sizes L seg = 10, 20, 30, 60, 90 s. In Setup-B, the identification rate with L seg = 10 s was 67.7 %, which was almost the same as the result with L seg = 60 s in Setup-R. The highest identification

V. DISCUSSION
This study aims to establish a repeatable and highly collectable EEG biometrics using a wearable in-ear sensor. We considered a biometric verification problem, which was cast into a one-to-one client-imposter classification setting. Notice that, as described in Section III-F, before classification, the validation matrix was normalised column-wise to the range [0, 1] using the corresponding maximum/minimum values of the training matrix.

A. Verification With Different Segment Sizes
Firstly, the classification results were compared for different segment lengths L seg , shown in Table VI. Within the same setup, i.e. Setup-R or Setup-B, the performance of HTER and AC increased with the segment length, although the results with L seg = 60 s and L seg = 90 s are almost the same. Longer segments allowed for more data epochs to be averaged over, hence the EEG noise inference for classification diminished and the inherent EEG characteristic were able to be captured by averaging. However, a longer segment length also implies a longer recording time, which is not ideal for feasible EEG biometrics. Compared to the results in Setup-R and Setup-B for the same segment length, both the HTERs and accuracy (ACs) of Setup-B were clearly better than those of Setup-R. In terms of client discrimination, the decrease in FRR was significant from Setup-R to Setup-B. In Setup-B, a larger number of client segments was correctly classified (see TP) than in Setup-R. In setup-R, only the result with L seg = 60 s achieved TP > (FN + FP), which indicates that the likelihood of making a true positive verification is higher than making a false verification; in Setup-B, the shortest segment size, L seg = 10 s, achieved TP > (FN + FP).
The difference between Setup-R and Setup-B was that the training matrices Y T in Setup-B included the trials which were recorded on the same day as the validation trial. In other words, the assigned validation data (trial) and the part of assigned training data (trials) were recorded within 5 -10 minutes in the same environment. Therefore, the training matrix contains significantly similar EEG recordings to the validation matrix in Setup-B, and this leads to a higher classification performance than the classification in Setup-B.
With an increase in the segment size L seg , the number of segments per recording trial, N, became smaller, especially for N = 2 with the segment size L seg = 90 s; therefore, the training matrix only contains six (2 × 3 trials) examples of client data in Setup-R (c.f. ten examples in Setup-B), which might not be enough for training client data. Hence, the performance with L seg = 90 s was slightly lower than that with L seg = 60 s in Setup-R. Table VIII shows the classification comparison among the minimum cosine distance methods, LDA and SVM. The SVM was used as a parametric classifier; firstly, the optimal hyperparameters (see details in Table IV) were selected from 5-fold cross-validation within the training matrix, and then weight parameters based on these chosen hyper-parameters were obtained. Notice that we could tune the classifier in different ways, e.g. in order to minimise false acceptance or minimise false rejection. The optimal tuning in this study was performed so as to maximise class sensitivities, i.e. maximise the number of TP and TN elements, which resulted in minimum HTERs. In both Setup-R and Setup-B, the FARs by SVM were smaller than those achieved by both the minimum cosine distance and the LDA, because the tuning was performed for maximising TN elements. Since the number of imposter elements was fourteen times bigger than the number of clients in both Setup-R and Setup-B (i.e. chance level was 14/15), the SVM parameters were tuned for higher sensitivity to imposters. As a result, the FRR by SVM, which were related to client sensitivity given in Table IX, were higher than those achieved by both minimum cosine distance and LDA in Setup-R.

B. Verification With Different Classifiers
In Setup-B, as mentioned above, the training matrix contains the data from the same recording day, which are more similar EEG patterns than the data obtained from a different recording day. Therefore, the SVM model chose hyper-parameters and weight parameters from the training matrix, so as to better the validation data in Setup-B, which led to higher performance than by both the minimum cosine distance and LDA.
Notice that, as described before, tuning of the hyperparameters was performed within the training matrix, then the so-obtained hyper-parameters were used for finding the optimal weight parameters within the training matrix. The same hyper-parameters and weight parameters were used for classifying the validation matrix. This setup is applicable for feasible EEG biometrics scenarios in the real-world.

C. Validation Including Non-Registered Imposters
In Table IX, the confusion matrices for the client matrix Y V C and imposter matrix Y V I from dataset S R and imposter matrix Y V I N , are given, which were then used for a comparison between Setup-R and Setup-B. Compared to the results obtained within the same classifier (minimum cosine distance, LDA, SVM) in Setup-R and Setup-B, the true positive rate (TPR) of clients Y V C and the sensitivity of imposter Y V I from dataset S R in Setup-B were higher than those in Setup-R. As described before, in Setup-B, the two client trials from the same recording day as the validation trial were included into the training matrix, and therefore more segments were correctly classified. In contrast, the TPRs of imposter Y V I N from dataset S N by both the LDA and SVM in Setup-B were almost the same to those in Setup-R; 89.9 % and 88.7 % for LDA, 96.2 % and 96.3 % for SVM. Compared to the TPR between two imposter data (Y V I and Y V I N ), regardless of the classifiers, the TPR of Y V I were higher than those of Y V I N .
Since the imposter data from S N were not included in the training matrix, the more S N data were misclassified as 'client' than S R data misclassified as 'client'.
However, in the real-world scenarios for biometrics, imposters are not always 'registered'. The lower TPR for Y V I N Fig. 8. Power spectral density for the in-ear EEG Ch1 (left) and the in-ear EEG Ch2 (right) of Subject 8. The thick lines correspond to the averaged periodogram obtained by the all recordings from the 1st day (red) and the 2nd day (blue), whereas the thin lines are the averaged periodogram obtained by a single trial. means that the application is inadequate for attack from non-registered subjects. One potential way to overcome the vulnerability of the minimum cosine distance classifier is by introducing threshold for classification. If the nearest distance is larger than the given distance parameter, the segment is excluded from the classification or is classified as imposter.

D. Client-Imposter Verification Results per Subject
For subject-wise classification, Table X summarises classification results obtained by the minimum cosine distance in Setup-R, for different training-validation scenarios. The results varied across subjects and for training-validation configurations between 91.1 % to 100 % of AC and between 0.0 % to 35.8 % of HTER.
The size of viscoelastic earpiece was the same for twenty subjects (both S R and S N subjects), therefore all the subjects were able to wear it comfortably. The upper bounds of the electrode impedance over three recordings per day of each participant are given in Table X (Impedance part). The highest performance was achieved by Subject 1, with maximum impedances of 9 k and 10 k for the 1st and 2nd recording, respectively. Even though the impedances for Subject 2 and Subject 5 were smaller than those for Subject 1, the corresponding performance was below average over fifteen subjects. Besides, the lowest performance was exhibited by Subject 8, for whom the impedances were smaller than 11 k for Day1 and 11 k for Day2. Figure 8 shows average PSDs for Subject 8 -observe that the PSDs for the EEG recorded on Day2 (blue) is slightly larger than those on Day1 (red).

E. Biometrics Identification
In terms of biometrics identification results, a one-to-many subject-to-subject classification problem, the average sensitivity over fifteen subjects, i.e. the identification rate, was 67.8 % in Setup-R with L seg = 60 s, as shown in Table X (right column). Figure 7 illustrates the identification rates of both Setup-R and Setup-B, with different segment sizes L seg = 10, 20, 30, 60, 90 s. The identification rate increased with segment length, although the results with L seg = 60 s and L seg = 90 s are almost the same.
Notice that the performances with L seg = 10 s in Setup-B and L seg = 60 s in Setup-R were almost the same, 67.7 % and 67.8 %, respectively. The highest identification rate in Setup-B Fig. 9. Power spectral density for the in-ear EEG Ch1 (left), and the in-ear EEG Ch2 (right) of one subject. The thick lines correspond to the averaged periodogram obtained by the recordings from 'sleepy' trial (red) and 'normal' trial (blue). Observe that the alpha power attenuated during the 'sleepy' trial. was 87.2 % with L seg = 90 s. Indeed, the training matrix for Setup-B included the trials which were recorded at the same day as the validation trial; therefore the performance was better than in Setup-R.
In a previous biometrics identification study, Maiorana et al. [22] analysed 19 channels of EEG during EC tasks in three different recording days, and achieved the rank-1 identification rate (R1IR) of 90.8 % for a segment length 45 s. Notice that it is hard to compare the performance with our approach, because the number of channels was very different, as 19 scalp EEG channels covered the entire head vs our 2 in-ear EEG channels embedded on an earplug. Therefore, although our results were lower, the proof-ofconcept in-ear biometrics emphasised the collectability aspect in fully wearable scenarios.

F. Alpha Attenuation in the Real-World Scenarios
One limitation of using the alpha band, is the sensitivity to drowsiness, a state where the alpha band power is naturally elevated. For illustration, Figure 9 shows the PSD obtained from a subject, calculated by Welch's averaging periodogram method. The subject slept during one recording, then the subject was woken up and another recording started less than 10 minutes after the first recording. The PSD graphs in Figure 9 are overlapped except for the alpha band; the alpha power observed during the 'sleepy' recording trial was smaller than that at the 'normal' recording, thus demonstrating the alpha attenuation due to fatigue, sleepiness, and drowsiness. The alpha attenuation is well known in the research in sleep medicine [26], [36], where it is particularly used to monitor sleep onset.

VI. CONCLUSION
We have introduced a proof-of-concept for a feasible, collectable and reproducible EEG biometrics in the community by virtue of an unobtrusive, discreet, and convenient to use in-ear EEG device. We have employed robust PSD and AR features to identify an individual, and unlike most of the existing studies, we have performed classification rigorously, without mixing the training and validation data from the same recording days. We have achieved HTER of 17.2 % with AC of 95.7 % with segment sizes of 60 s, over the dataset from fifteen subjects.
The aspects that need to be further addressed in order to fulfil the requirements for 'truly wearable biometrics' in the 'real-world' will focus on extensions and generalisations of this proof-of-concept to cater for: • Intra-subject variability with respect to the circadian cycle and the mental state, such as fatigue, sleepiness, and drowsiness; • Additional feasible recording paradigms, for example, evoked response scenarios; • Truly wearable scenarios with mobile and affordable amplifiers; • Inter-and intra-subject variability over the period of months and years; • Fine tuning of the variables involved in order to identify the optimal features and parameters (segment length, additional EEG bands).