Wavelet-Based Semblance Methods to Enhance the Single-Trial Detection of Event-Related Potentials for a BCI Spelling System

Based on similarity measures in the wavelet domain under a multichannel EEG setting, two new methods are developed for single-trial event-related potential (ERP) detection. The first method, named “multichannel EEG thresholding by similarity” (METS), simultaneously denoises all of the information recorded by the channels. The second approach, named “semblance-based ERP window selection” (SEWS), presents two versions to automatically localize the ERP in time for each subject to reduce the time window to be analysed by removing useless features. We empirically show that when these methods are used independently, they are suitable for ERP denoising and feature extraction. Meanwhile, the combination of both methods obtains better results compared to using them independently. The denoising algorithm was compared with classic thresholding methods based on wavelets and was found to obtain better results, which shows its suitability for ERP processing. The combination of the two algorithms for denoising the signals and selecting the time window has been compared to xDAWN, which is an efficient algorithm to enhance ERPs. We conclude that our wavelet-based semblance method performs better than xDAWN for single-trial detection in the presence of artifacts or noise.


Introduction
Brain-computer interface (BCI) research endeavours to provide new ways of communication for severely handicapped people by translating their brain activity into commands that can be used in a computer or other devices, without using the standard peripheral nerves and muscular pathways. In particular, brain-computer interfaces are control and communication systems that are designed to assist people with motor disabilities, such as people suffering from amyotrophic lateral sclerosis (ALS), spinal cord injury, multiple sclerosis, muscular dystrophies, and cerebral palsy. In this paper, we will focus on the noninvasive BCI [1].
Electroencephalography (EEG) is a noninvasive way of measuring over the scalp the electrical activity occurring as the product of the interactions of neurons in the brain [2]. EEG recordings are usually overlapped with noise and artifacts such as muscle activity. eir presence hinders EEG detection and requires novel methods to remove them from the underlying true brain signal. One important assumption about noise is that it is supposed to be independent of brain activity. It is assumed that brain signals are (almost) instantaneously recorded by the electrodes, implying that each recorded channel is highly correlated with the others. Accordingly, it is possible to conclude those signal components that are not correlated over channels are assumed to be noise or artifacts and can be removed from the recorded signals. A comprehensive survey of signal-processing algorithms for EEG applied to the BCI can be found in [3].
Event-related potentials (ERPs) are neural activities generated involuntarily as a consequence of the occurrence of an expected but rare event. A deflection appears in the EEG signal with a specific polarization and latency. For example, P300 is a cognitive ERP with a positive peak at 300 ms after the stimulus. ERP can potentially be detected with signal-processing techniques and used as a control command in BCI applications, such as the very well-known P300 speller proposed by Farwell and Donchin [4].
Unfortunately, major obstacles still exist in the use of EEG for brain-computer interfaces: signal changes to be detected are very small and high noise such as the signal depletion due to the skull or muscular artifacts is present. To enhance and detect the ERP response, it is necessary to repeat the stimuli and average the responses; however, this reduces the information transfer rate. Several efforts to reduce the number of averaged trials or to directly perform single-trial ERP detection have been made [5][6][7][8]. e intention of improving the single-trial classification performance ranges from increasing the database, adding artificial trials [9], where original data are deformed to create new artificial patterns, to EEG source reconstruction or transfer learning methods, where data from multiple users are used to train classifiers to improve the BCI system [10,11]. e single-trial approach saves time, but the signal-to-noise ratio (SNR) is very low, making ERP detection difficult. On the contrary, when the stimuli are repeated to enhance the ERP detection, these repetitions may become tedious and tiring for the user, and the averaging technique decreases the speed of the spelling. Another problem related to the average of repetitions is the assumption of stationarity. Latency jitter, amplitude variability, or phase artifacts between single trials can cause a flattening up to the elimination of transient characteristics [12,13].
Usually, in P300-based BCI systems, a temporal window is manually selected [14]. is window is usually chosen to be large enough (within a range of [0, 1] second) to ensure including the ERP components under the study, independently of the user reaction time to the stimulus. However, the ERP responses have different latencies (and amplitudes) for each subject, which comprise irrelevant data to be covered in the temporal window. is can increase both the difficulty of training a classifier with irrelevant variables and the complexity of detecting the ERP. e variance among trials can provide information on the subject's cognitive state, allowing comparisons to be made between subjects [15].
Several studies have shown that it is possible, yet difficult, to distinguish single-trial signals from the EEG background [16][17][18]. A popular technique is the xDAWN algorithm [19] which automatically enhances the ERP for classification by using spatial filters, combining the multichannel information to put aside useless components. An exhaustive comparative study of several classification techniques is given in [20]. A review of state-of-the-art methods for singletrial detection of event-related potentials can be found in [21]. Finally, a complete survey of the issues that should be considered when designing a new P300-based BCI paradigm can be found in [22].
Wavelets theory has been used in several studies for P300 detection [23][24][25]. In particular, some advances using wavelets for single-trial detection can be found in [16,26], where automatic denoising methods are recommended. e fundamental hypothesis of wavelet denoising is that large coefficients correspond to the signal and small coefficients correspond to the noise [27]. e problem with current methods is that they can only denoise one channel at a time, regardless of the information on other channels. is causes them to lose the information provided by the ensemble, such as phase and amplitude information. In ERP studies, the most common mother wavelets that are used are the quadratic B-spline [28][29][30], the Daubechies wavelets Db4 and Db8 [31], the Symlet wavelet [31,32], and Coiflet [33]. For example, in [34], a single-trial P300 detection algorithm is presented based on independent component analysis (ICA) and wavelets. Nevertheless, despite these advances, singletrial P300 detection still needs to be improved before it can be made more available for the general public.
In this paper, we introduce a novel method to denoise, localize, and isolate ERPs combining two approaches based on wavelet theory. is formalism is used to study singletrial brain signals based on similarity measures. e first approach simultaneously denoises the signals by using the phase information provided by all the channels in a single trial. Afterward, the second approach combines the phase and the amplitude information of the signals to optimize the time window of the ERP for each user. e rest of this paper is organized as follows: In Section 2, we presented wavelet theory and semblance analysis to introduce our proposal of using the correlated information of recorded channels to remove noise and automatically establish an appropriate time window for the analysis of each subject. e results are provided in Section 3 and the discussion in Section 4. Finally, our conclusions are given in Section 5.

Wavelet Transforms.
e wavelet transform is the inner product of a signal x(t) with scaled and shifted versions of a mother wavelet ψ(t) ∈ L 2 (R) function [35]. e continuous wavelet transform (CWT) uses a continuous wavelet function for the signal analysis: where a and b change continuously and ψ * is the complex conjugate of ψ. e CWT coefficients measure the variation of x in a neighborhood of point b, whose size is proportional to a, obtaining a mapping of a one-dimensional signal into a two-dimensional space. On the contrary, the discrete wavelet transform (DWT) uses filter banks to obtain a multiresolution time-frequency representation. More precisely, the discrete orthogonal wavelet decomposition is obtained using a discretised scale and translation.

Semblance Analysis.
Wavelet analysis is also useful for bivariate analysis, making it possible to study two different signals to discover the relationship between them. Crosswavelet analysis allows us to find the mutual characteristics between signals using the available information in the wavelet transform. e cross-wavelet spectrum [36] of two different signals x(t) and y(t) is defined by their wavelet decompositions W x ψ and W y * ψ as follows: where W xy ψ is a complex value and can be decomposed into amplitude |W x ψ W y * ψ | and phase θ � tan − 1 (I(W xy ψ )/R (W xy ψ )). Semblance analysis [37] was introduced to compare two given signals x(t) and y(t) based on the phase correlations between their wavelet decompositions W x ψ and W y ψ using θ: where n is an odd integer that is greater than zero. e reason why n should be odd is to preserve the sign of the cosine. e use of large numbers for n also produces a sharp semblance response, as demonstrated in [37]. e values of S ∈ [− 1, 1] correspond to the phase correlation between the two signals, where S � 1 means they are fully correlated, S � − 1 means they are fully inversely correlated, and S � 0 means they are not correlated. Also, it is possible to analyse the signal's amplitudes combining the phase information S with the amplitude information as follows:

Multisemblance Analysis.
e semblance concept was extended to compare more than two signals at the same time.
is measure is called the mean resultant length (MRL), and it was presented by Cooper in [38] based on circular statistics [39]. e MRL can be calculated according to the number N of signals treated, at each scale a and time t: .
(6) For more than two signals, the inversely correlated concept does not apply. is is verified by the MRL values ranging from 0 for uncorrelated signals to 1 for correlated signals.

Denoising Based on the Similarity in the Channels.
We propose to denoise EEG signals considering the information of all channels based on their phase and correlations within the DWT transform. Let X be the matrix containing the whole dataset, and let x c (t) be the signal recorded by the c th electrode, c ∈ 1, . . . , C { }, at time t, t ∈ 1, . . . , T { }. e matrix of the recorded EEG signals can be defined as X ∈ R T×C . e denoising, through thresholding, can be done using the MRL coefficients, i.e., using coefficients obtained for all channels (equation (6)), instead of the individual channel coefficients. We prune all the coefficients that are below a given threshold τ d to zero in order to reconstruct the signal using the filtered wavelet coefficients. e MRL computation is done through the combination of the phase angles of the real and imaginary parts of the wavelet decomposition. DWT uses wavelet families that are orthogonal to each other so that the imaginary part can be achieved by the Hilbert transform of the channel [38].
In simple words, we keep those components with high similarity between channels to produce a denoised EEG signal.
is novel approach is called multichannel EEG thresholding by similarity (METS), and the full wavelet denoising process is described in Algorithm 1.

Temporal Window Selection.
We propose to compute a variable time window, by comparing the averages of the target and nontarget signals. Two alternatives are proposed: by channels or by the grand averages over channels. e purpose is to have a better expressiveness of both amplitude and phase to isolate the ERP wave. e hypothesis is that the ERP is not correlated with the EEG background activity, and this should be reflected in the semblance analysis.
Our approach independently computes the correlation between the averages for each channel or the grand averages over channels, using the continuous wavelet transform, to obtain a continuous trend in time of the correlations by each scale. According to Kolev et al. [40], the most significant differences between target and nontarget responses in the frequency domain are found in delta and theta brain rhythms. Consequently, the scales that are used to analyse the signal averages were selected according to these rhythms. Figure 1(a) presents the dot product obtained by applying equation (5) to the grand average where it is easy to identify the less correlated components in cold colors and the more correlated components in warmer colors. ese warmer colors correspond to the EEG background, yet the correlation is not uniform between scales. It is possible to localise the ERP (in blue) within a thinner time window compared to the original. However, the ERP is not present in all scales, making difficult an automatic detection of this temporal window. Fortunately, the ERP produces a high variability over the scales at a specific time on the dot product D given by equation (5), which leads to analysis of the standard deviation for each time point. As the magnitudes of the wavelet coefficients can be different for each subject, we normalise the standard deviation between 0 and 1. Finally, we select the temporal window utilising a predefined threshold, as shown in Figure 1 is technique is completely independent of the previous denoising step introduced in Section 2.4, although better results are expected if both techniques are used together.

Semblance-Based ERP Window Selection by Channels.
Let the signals be denoted by x c (t), where c corresponds to the channel and t ∈ 1, . . . , T { }, and M be the set of all the stimuli, M � T, N { }, in which T corresponds to the subset of target stimuli and N corresponds to the subset of nontarget stimuli. e responses to a stimulus in a predefined temporal window of size t up can then be extracted as follows: Computational Intelligence and Neuroscience where s i corresponds to the stimulus onset i. e average response for each type of stimulus, and for each channel, can be computed as follows: in which the operator | · | denotes the cardinal number. After obtaining the averages, we compute their continuous to calculate the dot product D through equation (5).
In the example given in Figure 1, the result obtained for the dot product D corresponds to Figure 1(a), where the cold colors indicate that the signals have a maximum difference, and P300 is located in the spatial space. e normalised standard deviation of D is shown in Figure 1(b). By using a threshold τ w , 0 ≤ τ w ≤ 1, the new temporal window, for the current channel, is selected within limits t low and t up obtained through a sequential search of the first points that exceed the threshold from both the left and right boundaries. We named this algorithm "semblance-based ERP window selection by channels" (SEWS-1), and Algorithm 2 describes the complete window selection process.

Semblance-Based ERP Window Selection over Channels.
A variation of this model is used to perform the semblance-based ERP window selection over channels (SEWS-2), as summarised in Algorithm 3, where the same (1) Input: given the EEG signal matrix X, with C channels and T temporal samples (2) Output: the denoised signals X (3) Set the correlation threshold τ d , 0 ≤ τ d ≤ 1 (4) for c � 1 to C do (5) Compute the Hilbert transform H c of x c (6) Compute the DWT W x c ψ of signal x c and W H c ψ of H c (7) end for (8) for t � 1 to T do (9) Compute the MRL(t) using equation (6) (10) if MRL(t) < τ d then (11) set to zero W  Computational Intelligence and Neuroscience temporal window is selected for all channels. To do this, the only difference in the process is to compute the grand average through all of the channels using Further information about algorithms and methods can be found in [41].

Databases
2.6.1. UAM Dataset. We employ a P300 dataset [42] recorded by the Neuroimaging Laboratory of Universidad Autónoma Metropolitana (UAM), Mexico, using the P300 speller [4] included in the BCI2000 platform [43]. e dataset (http://akimpech.izt.uam.mx/p300db/; https:// tinyurl.com/ycr52v9b) contains 22 first-time healthy users, and all subjects have similar characteristics, such as sleep duration and age. A total of 10 electrodes were recorded (Fz, C3, Cz, C4, P3, Pz, P4, PO7, PO8, and Oz) providing the best features for discrimination [20,44]. e signals were recorded at 256 Hz using a g.tec g.USBamp EEG amplifier, a right ear reference, and a right mastoid ground. An 8th order 0.1-60 Hz Chebyshev bandpass filter and a 60 Hz notch were used. e stimulus is highlighted for 62.5 ms with an interstimulus interval of 125 ms. In order to validate our algorithms, we use session 1 (copy spelling session) to train a classifier and session 3 (free spelling session) to test the generated models. is choice was made because both sessions do not use feedback and because the detection methods have to be robust under different conditions; that is, the algorithms will be studied under a multisession scheme. From a single-trial perspective, the dataset contains 240 letters per person for session 1 and 250 letters approximately for session 3 because the number of letters varies depending on the subject. Finally, considering that, for the detection of one letter, it is necessary to identify two P300s, i.e., detecting twice 1 ERP among 6 responses, and the letter detection by chance is 1/36.

Results
is section presents the experiments for evaluating the performance of the new methods, which are introduced by contrasting them with the xDAWN algorithm and the state of the art on wavelet thresholding techniques. Table 1 shows the results of our denoising algorithm METS compared with the results of the classic thresholding methods. e baseline results only using a bandpass filter of [0. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] Hz are also presented. All thresholding techniques improve the baseline result on average, meaning that these techniques are actually denoising the original signals and they do not remove the relevant information. We can observe that the performance of the subject with the maximum value slightly decreases using METS. However, our algorithm is, in general, able to remove the noise from the signal's subjects. is improves the results obtained by the classic thresholding techniques based on wavelets. Moreover, the standard deviation is less in our approach, meaning that the increase in the mean is global and not due to the improvement of subjects with best results. In fact, the maximum value is the same as that obtained by minimax and universal thresholds, while the minimum is improved by METS.

Experiment: METS vs. Classic Wavelet Methods.
(1) Input: given the EEG signal matrix X, with C channels and T temporal samples (2) Output: the bounds of the temporal window t low c and t up c for each C channel (3) Set the window threshold τ w , 0 ≤ τ w ≤ 1 (4) for c � 1 to C do  (11) e limit t low c is given by the first t from the left which meets σ(D(t)) > τ w (12) e limit t up c is given by the first t from the right which meets σ(D(t)) > τ w (13) end for ALGORITHM 2: Semblance-based ERP window selection by channels (SEWS-1).  Table 2 shows the impact of using the METS algorithm as the previous step for the time-window selection. e threshold values for the algorithms are τ d � 0.999 for METS, τ w � 0.1 for SEWS-1, and τ w � 0.2 for SEWS-2, according to the previous study [41].

Experiment: Comparison with xDAWN Algorithm.
e xDAWN algorithm plays a similar role to the algorithms proposed in this paper because METS also uses the multichannel information. Table 3 presents the results of xDAWN, METS & SEWS-1, and METS & SEWS-2, for each subject. e minimum sample size for each subject was 2160 trials. Note that the baseline performance is 1/36 � 2.8% for a random detection. e preprocessing of the signal was performed according to the specifications in [19]. e only difference from our experimental setup is the high cutoff filter frequency of 1 Hz, instead of 0.1 Hz. For xDAWN, three sources (channels) were used because previous studies have shown that the best result was obtained using this configuration [41].
To check the normality assumption, we applied the Shapiro-Wilk pairwise test with a significant level α of 5%.  Table 4 summarises the statistics of the letter accuracy for xDAWN compared to our algorithms. Our proposed algorithms obtained the best results, showing improvements in mean, standard deviation, and maximum and minimum values. Surprisingly, the mean results for xDAWN do not improve the baseline of 52.50% (see Table 4) obtained with the [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] Hz filter, achieving good results for subjects with high performances and very low results for subjects with low performances, as shown in Table 3. Table 5 shows the results corresponding to the evolution of the algorithms when using different numbers of repetitions, going from a single trial up to five repetitions. It is possible to observe that METS & SEWS-2 perform better than METS & SEWS-1 using repetitions. As was expected, METS & SEWS perform better than xDAWN for a few (two and three) repetitions. For four repetitions, the results are (1) Input: given the EEG signal matrix X, with C channels and T temporal samples (2) Output: the margins for the temporal window t low and t up (3) Set the window threshold τ w , 0 ≤ τ w ≤ 1 (4) Compute the grand average GA T of responses belonging to the target class over channels (5) Compute the grand average GA N of responses belonging to the nontarget class over channels (6) Compute the CWT W GA T ψ and W GA N ψ using equation (2) (7) Compute the semblance using equation (4) (8) Compute the dot product D using equation (5) (9) Compute the standard deviation σ(D) of D and standardise it between 0 and 1 (10) e limit t low is given by the first t from the left which meets σ(D(t)) > τ w (11) e limit t up is given by the first t from the right which meets σ(D(t)) > τ w ALGORITHM 3: Semblance-based ERP window selection over channels (SEWS-2).   Table 6. To check the normality assumption, we applied the Shapiro-Wilk pairwise test with a significant level α of 5%. For all the combination of algorithms, we cannot reject the normality assumption. e performance of the METS-SEWS algorithms is statistically greater than that of the baseline and the METS algorithms according to the t-test with a significant level α of 5%. However, METS & SEWS-1 and METS & SEWS-2 have no statistically significant difference according to the t-test (α � 5%). It is important to note that the combination of METS and SEWS offers an improvement for almost all the subjects. Specifically, SEWS-2 is more effective for disabled subjects (i.e., the first four subjects). Moreover, it obtains an impressive improvement from 45.31% to 62.50% for the healthy subject numbered s8.

Discussion
e results show that the introduction of the channels' correlation in the process and the fact that the channels are processed as a whole improve the denoising step for the classification. e thresholds in classic methods are selected using statistical techniques to infer a suitable threshold for each subject, while for METS, we used a fixed threshold τ d for all subjects.
us, METS could be extended to use a similar automatic threshold selection to obtain better results.
Although the improvements compared to the classical wavelet methods may seem modest in terms of absolute   Computational Intelligence and Neuroscience values, we should take into consideration that the probability of detecting a letter in the P300 speller, by chance, is only 1/36 (detecting twice 1 ERP among 6 responses), in contrast to the probability of randomly detecting a single ERP of 1/2. is means that there is a high probability of missing a letter that was correctly detected (0.97) because of randomness and only a small chance of correctly detecting a letter (0.03) that was previously misclassified.
Also, the METS algorithm is flexible enough to exploit the specific properties of the analysed phenomenon through the parameter selection. In fact, the mother wavelet is one important parameter. Unfortunately, there is no formal method to choose a mother wavelet. It can only be chosen by its resemblance in shape to the underlying target phenomenon or by their properties. Table 3 suggests that the data of subjects with high accuracy are less noisy or have stronger ERP responses, making the classification results similar for all methods. However, for a more general solution that can deal with signals from subjects with poor original performance, either because they are first-time users or because they do not generate a strong P300 response, our approach seems more suitable.
erefore, we conclude that wavelet-based semblance performs better than xDAWN for single-trial detection in the presence of artifacts and noise. Regardless, xDAWN should have a better performance using averaged responses. ese results are consistent with the theory behind both algorithms: xDAWN improves with more repetitions because averaged ERPs are easier to enhance due to the increased SNR, while the effect of METS becomes moderate because averaging naturally removes uncorrelated components. Nevertheless, xDAWN and our proposed algorithms are not mutually exclusive, which means that their clever combination could be developed in the future.
As is explained in Section 2.5, a temporal window must be set to specify the segment to be analysed. e removed information usually lies at the end of the original temporal window, beyond the segment where P300 theoretically lies. Both SEWS algorithms improve the results compared to the results of only using METS. When comparing with the baseline filter results, the combination of our two independent algorithms significantly improves the ERP detection in a single trial and only incurs a small computational cost compared to the time required for the training and classification phases. In particular, SEWS-1 obtains the best maximum and minimum accuracy, but, on average, both window selection algorithms are competitive. Besides, a better SNR makes the classification easier and implies that useless features removed by SEWS have less impact on the decision boundaries. e experiment using the EPFL dataset validates our techniques for single-trial ERP detection not only because it is a different dataset with a different display matrix and patients but also because it uses the same parameters used for the UAM dataset. Finally, better performances are expected if the parameters (like the thresholds) are fine tuned for this dataset.

Conclusion
We introduced two automatic methods in this paper. e first method aims to simultaneously denoise channel recordings, which allows us to detect signal components that are not present in all channels and improve the analysis of EEG signals in general. e second method uses the continuous wavelet transform to analyse ERP's time windows. e core element of the time-window selection algorithms is the dot product D, and the techniques presented in this paper only explore a few of the possibilities of what can be done using D.
e standard deviation exploits the information carried by D. A more elaborate or detailed method could be developed in the future to increase the performance or robustness of the time-window selection. Moreover, our analysis of the similarity of signals based on wavelets opens several research opportunities for EEG applications. For example, the same principle behind the timewindow selection can be applied to detect constant scales through the full signal to select the most useful frequency bands for the problem under study.
Finally, these similarity measures can be applied to the study, and comparison of the EEG behavior of different subjects for the same application can be applied to help us better understand the physiological responses of the brain or to develop more robust BCI techniques. Table 6: P300 detection percentage for the EPFL dataset using the proposed algorithms for each subject. Descriptive statistics for all the subjects on the P300 detection percentage are also reported (mean, standard deviation, min., and max.

Conflicts of Interest
e authors declare that they have no conflicts of interest.