ERP-WGAN: A data augmentation method for EEG single-trial detection

Brain computer interaction based on EEG presents great potential and becomes the research hotspots. However, the insufficient scale of EEG database limits the BCI system performance, especially the positive and negative sample imbalance caused by oddball paradigm. To alleviate the bottleneck problem of scarce EEG sample, we propose a data augmentation method based on generative adversarial network to improve the performance of EEG signal classification. Taking the characteristics of EEG into account in wasserstein generative adversarial networks (WGAN), the problems of model collapse and poor quality of artificial data were solved by using resting noise, smoothing and random amplitude. The quality of artificial data was comprehensively evaluated from verisimilitude, diversity and accuracy. Compared with the three artificial data methods and two data sampling methods, the proposed ERP-WGAN framework significantly improve the performance of both subject and general classifiers, especially the accuracy of general classifiers trained by less than 5 dimensional features is improved by 20 – 25%. Moreover, we evaluate the training sets performance with different mixing ratios of artificial and real samples. ERP-WGAN can reduced at least 73% of the real subject data and acquisition cost, which greatly saves the test cycle and research cost.


Introduction
EEG has been widely applied in many research fields given to its advantages of non-invasiveness, portability, high time resolution and low cost.With the integration of neurocognition and information security, scientists have studied extracting individual identity features from brain working patterns.Researchers have extensively studied the application of identity authentication (Mareike et al., 2021) in brain computer interface (BCI) system.
Single-trial P300 detection has become a research hotspot given to its fast interaction speed and efficient interactive performance.Researchers have explored various feature extraction and classification for singletrial EEG.Fuhrmann Alpert et al. ( 2014) innovated a feature extraction method based on the cascade information of all channels, and adopted spatial weighting to identify the event-related potentials (ERP) of a single trial.The secure highest degree clustering algorithm (HDCA) method was proposed by Marathe et al. (2014), which was suitable for EEG signals with time variability in real environment.In the target detection experiment of 15 subjects, the HDCA method significantly improved the signal-to-noise ratio (SNR) and the single-trial accuracy.According to the rhythm of EEG signals, Aydın et al. (2018) applied the wavelet correlation method to extract the non-stationary characteristics of EEG sequences and estimated the nonlinear hemispheric synchronization level.Moreover, nonlinear methods were often applied to EEG feature extraction, such as permutation entropy (Aydın et al., 2019), sample entropy (Wang et al., 2021), and Lyapunov exponent.Cecotti et al. (2014) applied a deep network in the single-trial test of RSVP with two different recognition difficulties, the average area under curve (AUC) values for face and video game achieved 0.869 and 0.932.In addition, phase space trajectory matrix (PSTM) (Aydın, 2019) was proposed to quantify the complexity of neurons, which combines neuro-cortical connectivity to improve the performance of deep learning model.However, the above research still faces the problem of insufficient data, which has a negative impact on accuracy and robustness of the classification model.
The scale of EEG database is the bottleneck in BCI field (Lashgari et al., 2020).EEG experiment is time-consuming and long cycle.In addition, the physiological characteristics of subjects limit the length of EEG experiment.On one hand, the low efficiency of EEG acquisition limits the sample size.The subject number is less than 30 in most experiments.A small-scale EEG acquisition plan consume about two months, while large-scale database acquisition greatly increases the research cost and experimental period.On the other hand, the sample number in a single-trial experiment is deficient.The fatigue state with too long experimental time leads to poor quality of EEG data.Subjects' physiology restricts the sample size of a single experiment to be less than 500.In addition, the oddball paradigm requires less than 30% of the target frequency, which results in a more serious shortage of target samples.The training sample size of BCI technology is inadequate and the positive and negative samples are unbalanced, which leads to limited accuracy and interactive efficiency.The lack of EEG samples has seriously affected the user experience and hindered the BCI promotion (Chawla et al., 2002;Xu et al., 2015).
In recent years, deep network has shown great promise in various fields.With the development of generative adversarial networks (GAN) (Lashgari et al., 2020;Goodfellow et al., 2014) in neuroscience.GAN promotes data-driven generation synthesis, and has achieved great success in the field of imaging (Yang et al., 2018).Through the multi-round iteration between the generator and the discriminator, the false data distribution gradually approaches the real data distribution.By fitting the spatial distribution of real samples, the artificial data not only has the attributes of real samples, but also has diversity and even innovation.Because the original GAN has the problem of unstable training, WGAN (Arjovsky et al., 2017) optimizes the loss function to improve the stability of learning and eliminates problems such as mode collapse.Consequently, GAN has become an effective data augmentation method in the BCI field.The artificial data generated by GAN alleviate the problems of uneven samples, inadequate samples and overfitting.
Researchers have explored the application of GAN in the field of EEG.In the field of motor imagery, Hartmann et al., 2018 constructed EEG-GAN based on CNN.The team trained the EEG signal of a simple motion task, and calculated several indexes to estimate the performance of artificial samples.Zhang and Liu, 2018 adopted the cDCGAN network to expand the artificial data of C3, CZ and C4 electrode samples.The results recommended that the classification accuracy was improved by different proportions of mixed data.Nik Aznan et al., 2019 studied several GAN models on steady state visual evoked potential (SSVEP) dataset, and the cross-subject task accuracy rate increased by 35%.In the field of emotion recognition, Wang et al., 2018 adopted GAN to generate synthetic data on SEED dataset (Zheng and Lu, 2015) and MAHNOB-HCI (Soleymani, 2012) dataset.The results supported that deep network classifier needed more training data than traditional classifier.In addition, Brophy applied DCGAN network in image domain to generate EEG time series signals.
However, the existing GAN methods in EEG is still on the exploratory stage.Especially, the existing data enhancement methods are difficult to deal with single P300 task, which has the difficulty of low SNR and high noise.Kwon et al., 2019 specially pointed out that the data augmentation of EEG single-trial was the crucial issue in BCI.
The large diversity of EEG data is partly due to exogenous stimulus parameters such as duration, content, emotion, cognition and other factors.In this study, the EEG generation method aims at the timelocked task of the oddball paradigm, especially for the event related potentials (ERP) components such as P300, P250 and P100.To improve single-trial authentication performance, the data enhancement framework is used to generate high-quality P300 single trial signal, and the classification performance is improved by mixing artificial data.We adopt the data set of target detection task, which faces with serious sample imbalance problem.The low SNR and large individual differences of EEG results in the convergence difficulty.To promote the GAN structure more suitable for EEG samples.The EEG signal quality, distribution correlation and diversity generation are considered in BCI design.On the one hand, in view of the low similarity between artificial samples and real samples.The EEG resting state data becomes the generator input instead of Gaussian white noise.Resting state represents the individuals' basic brain state.The intrinsic correlation between resting state and task state is greater than Gaussian white noise.At the generator output, to solve the problem of the low SNR and significant individual differences, the generated artificial data has more glitches and obvious noise.Hence the low pass filtering and Gaussian smoothing are added to the generator output.On the other hand, for the deficient diversity of artificial samples.The amplitude of artificial EEG was adjusted randomly.In addition, the generated model converges after 500 iterations and produces high-quality artificial data.The generators with more than 500 iterations have high artificial data performance.We apply multiple different iterative generators to improve the diversity of artificial data.10 GAN networks with different iteration rounds are trained to generate abundant artificial data.
In this work, we proposed a data enhancement framework, and explored the adaptive improvement of EEG characteristics.To improve the quality of artificial EEG, different adaptive methods were adopted to improve the convergence ability of GAN network.We analyzed the artificial data and real data comprehensively from verisimilitude, diversity, joint feature distribution and clustering.The proposed method can effectively expand the EEG sample size and enrich the diversity of training data.In terms of classification performance, the combination of artificial data and real data significantly improves the performance of subject-classifier (Aydın, 2019) and general classifier.
Our major contributions are summarized as follows: (1) A data enhancement method for EEG signals of time-locked tasks is proposed, which effectively improves the stability of the model and greatly saves training time.
(2) This work alleviates the EEG sample imbalance problem by generating artificial data, and the mixed training set improves the performance of general and subject-classifiers.
(3) This study presents a variety of mixed strategies of artificial and real EEG data, and summarize the optimal fusion ratio and the limit mixing ratio.

Data augmentation framework
The data generation framework for single-trial EEG is visualized in Fig. 1.The whole framework consisted of three parts: ERP-WGAN module, classifier-training module and test-feedback module.Data generation module (Fig. 1 A) trained the generative network by imitating the real EEG signals.The ERP-WGAN produced the artificial EEG data in batches.The classifier-training module (Fig. 1 B) mixed the artificial positive samples with the real positive samples, which made the number of mixed positive samples equal to the real negative samples in training set.The mixed samples were applied to train the target detection classifier.In the test-feedback module (Fig. 1 C), EEG signals of real subjects were adopted to evaluate the classification performance, and the test results were fed back to optimize the model parameters.
The database used in this study was lin et al.Zhimin et al. (2017) EEG target detection task.The 19 participants were divided into training data (18 participants) and test data (1 participant).The training data of 18 subjects were used to train the generative network to produce artificial data, and the real and artificial data were mixed to train the classifier to identify targets and non-targets EEG data.The training set collected 3240 s of resting state (1080 s per person) and 5400 target EEG data (300 per person) from 18 subjects for ERP-WGAN training.5400 target EEG data are used as real template to guide the generator to transform the resting state signal into realistic artificial data.The following was the implementation details of the data enhancement framework: ERP-WGAN model training process was as follows: Step 1: Randomly intercept 16-point resting-state EEG data from the training set.Then, the generator transmits the resting-EEG layer by layer to form 960-point artificial data.
Step 2: The original artificial data is filtered by low-pass filtering module to improve the quality of fake EEG data.
Step 3: Real and artificial EEG data are input into the discriminator to judge the authenticity of the classifier.
Step 4: The wasserstein loss function is applied to measure the distribution distance between real samples and artificial samples.
Step 5: The generator updates the network parameters for every round to reduce the distribution distance.
Step 6: The discriminator updates the network parameters every 5 rounds with the goal of increasing the distribution distance.
Repeat 1: Cycle steps 1-6, until the artificial sample shows the obvious ERP components and the loss function tends to be stable.At this time, the generator and discriminator are close to Nash equilibrium and complete the training of a single ERP-WGAN model.
Step 7: Enter multiple groups of resting state signals to the trained generator.Randomly jitter the generated artificial data amplitude to enhance diversity.
Classifier-training module process is as follows: Step 8: Artificial samples and 18 real samples were fused according to a variety of mixed proportion strategies.
Step 9: Extract the mixed sample feature, then put the training matrix of sample -feature into the classifier.
Test feedback module process is as follows: Step 10: 1 Subjects ' target and non-target real EEG data are applied to classification tests.
Step 11: Classification accuracy and clustering visual feedback to correct the classifier and ERP-WGAN model.Repeat 3: Cycle steps 1-11.Data generation and target and nontarget EEG classification tests were applied to each participant.Aggregate and average the classification results as the overall performance.

ERP-WGAN module
The basic GAN (Goodfellow et al., 2014) consisted of generator and discriminator.The generator transformed the input noise to fool the discriminator, while the discriminator continuously improved the classification ability to distinguish artificial and real data.GAN alternately trained the generator network and discriminator network, which made the output of generator close to real data distribution.After several rounds of iterative upgrading, the final generator and discriminator reached Nash equilibrium.The generator had the ability to produce realistic artificial samples that confused the discriminator.
The following innovations are proposed for ERP-WGAN based on EEG characteristics: (1) EEG resting data become generator input instead of Gaussian white noise.The correlation between resting state and task state EEG improves the stability of the model and saves training time.
(2) Low-pass filtering is applied to the generator, which improves the quality of artificial EEG signals and reduces the iterations number.
(3) Multiple generators with different iteration round jointly generate artificial data, which improves the diversity of EEG artificial samples.

Generator
Fig. 1A presents the generator training stage of ERP-WGAN network.The basic GAN model (Goodfellow et al., 2014) was improved to promote the verisimilitude and diversity of generated samples.
To improve the verisimilitude of artificial data, subjects' resting state data were used to replace the input of Gaussian white noise.As the basic state of the brain, resting state was invariably applied to compare the brain response of task state.The randomness and fluctuation of EEG signal were consistent with Gaussian white noise.Studies (Li et al., 2020;Li et al., 2019) have claimed the correlation between the amplitude and latency of resting state and task state.The prior distribution of resting state was closer to task state than Gaussian white noise.These correlations reduced the difficulty of imitating noise into task brain signals.Consequently, resting state data has the prior input advantage of generating network.In addition, the generated artificial data contained serious glitches and obvious noise at high frequency.Given to the effective components of ERP were mainly concentrated in the low frequency below 30 Hz, a 0-30 Hz low-pass filter was set at the generator output.A Gaussian smoothing module was also adopted to alleviate the abrupt glitch of artificial data.
Two methods are used to improve the diversity of artificial data.On the one hand, an amplitude random module was inserted between the ERP-WGAN module and the classifier-training module.The amplitude of artificial EEG was randomly scaled by 0.9-1.1 times to improve variety.On the other hand, 10 generator models were trained with different iterations, and each generator outputs artificial samples to the training set.

Discriminator
The main task of the discriminator (Mirza and Osindero, 2014) is to distinguish real and artificial data and calculate the loss function between them.The loss function measures the distance between the artificial sample and the real sample, which is used as a gradient return index to update the network parameters.KL divergence (Barz et al., 2018) was the loss function of the basic GAN network, and the simulation degree of the artificial sample was quantified by calculating the relative entropy between the real sample and the artificial sample.The JS divergence (Arjovsky and Bottou, 2017) was optimized based on KL divergence to solve the asymmetric problem of loss function.In further research, wasserstein presented better performance than KL divergence and JS divergence in measuring the distance of sample distribution.The outstanding advantage of wasserstein was to quantify the distance between two non-overlapping distributions.Therefore, ERP-WGAN adopted Wasserstein distance as loss function to improve model stability and training efficiency.
Moreover, fewer network layers and multi-step iterations limited the initial classification ability of the discriminator, which was beneficial to the continuous learning of the generator in the game process.

Network parameters
20 samples constituted a mini-batch of generator input.The resting state data were divided into 16 non-overlapping time periods and standardized by Z-score method.The generator consisted of five full connection layers (network nodes: 16, 64, 128, 256, 960).The rectified linear unit (RELU) was applied as the activation function.The discriminator consisted of four full connection layers (network node: 960, 256, 64, 2).The activation function of the first two layers of discriminator was RELU, and the last layer was sigmoid.
The learning rates of generator and discriminator were 0.00015 and 0.0001 respectively.The discriminator updated the gradient every 10 iterations.Each generator was trained for 500-1800 iterations randomly.Each channel trained 10 independent ERP-WGAN models, and a total of 160 generators were trained to product artificial data.Every ERP-WGAN model generated 2000 EEG samples, and each channel obtained a total of 20000 artificial EEG data.

Classifier-training module
Classifier-training module included mixed samples, feature extraction and classifier design.
The classification task was to distinguish the brain patterns activated by target and non-target information, that is, subjects' brain activity when watching target images (P300 response) and non-target images (non-P300 response).It should be noted that resting-state EEG was only applied in the data generation stage and was not involved in the later classification task.
GAN model training needs a large amount of real data, and the data of a single subject is difficult to make the generated model converge.Therefore, this work completed ERP-WGAN training using group samples of 18 subjects.In particular, both classifier and ERP-WGAN model training use training set data, which do not cross with test samples.
The mixed samples expanded the training set size and balanced the number of positive and negative samples.The proportion of raw positive and negative samples in the target detection task (Zhimin et al., 2017) was 1:12.Hence the data generation model tended to generate artificial positive sample.This research poured attention into the proportion of artificial samples and real samples in the training set.On the one hand, the mixing ratio affected the performance of the classification model.On the other hand, the upper limit of adding artificial samples is constrained by the amount of real samples.In the research, we explored the low and high proportion mixing of artificial data and real data.Fifteen different mixing ratio schemes were explored.
In the experimental feature extraction step, research indicated that the main feature difference of target detection task was the P300 component in time domain.First, we extracted features from the ERP time domain 0-1600 millisecond (ms) amplitude.Each trial was divided into 8 sections with 200 ms length.Secondly, summing the amplitudes of each segment, and a total of 16 * 8 = 128 energy features were obtained.Finally, the principal component analysis (PCA) method was applied to reduce the dimension of 128 original features.The number of features is reduced by half to 60 dimensions, which effectively alleviates the over fitting problem and reduces the computational consumption.
This work compared linear discriminant analysis (LDA) and logistic regression (LR) classifiers.The two classifiers tested the data augmentation performance of mixed training set with different proportions.The classifiers were implemented by Matlab functions.LDA classifiers adopted fitcdiscr function and set parameter ('discrimtype ','linear').LR classifier applied glmfit classifier parameters ("binomial", "link", "logit").

Test-feedback module
The classification results of subject-classifier and general classifier were fed back to the generation and classifier module.The GAN network parameters were optimized by the classification results to generate higher quality artificial samples.The performance of subject-classifier was tested by five-fold cross validation.The raw data were divided into training set and test set in a ratio of 4:1.The first twenty blocks were the training set, and the last five blocks were the test set.In the general classifier performance test, one subject was selected as the test set in order, and the other 18 subjects with the artificial EEG data formed the training set.The subjects of training set and test set were not overlapped in the general classifier.The average accuracy of 19 subjects evaluated the general classifier performance.

Conventional data generation and sampling methods
To measure the performance of ERP-WGAN in generating artificial EEG data, we compared 3 conventional data generation methods and 2 conventional data sampling methods.The data generation method expands the dataset by constructing artificial new samples that are not included in real data, while the data sampling method achieves sample equilibrium by copying or eliminating real data.
Conventional data generation methods include SMOTE (Synthetic Minority Oversampling Technique) (Deepa et al., 2010;Baby et al., 2010;Moubayed et al., 2017), Borderline-SMOTE (Han et al., 2005) and ADASYN (adaptive synthetic sampling) (Pereira and Gomes, 2016;Shoorangiz et al., 2016).As shown in Fig. 2, the difference among the three traditional methods is that they tend to generate artificial samples in different feature space positions.The SMOTE method generates data randomly between any two real sample connections.The Borderline-SMOTE approach first uses k-nearest neighbor algorithm to divide real samples into safe, danger and noise, and only generates artificial samples between two real danger samples on the boundary line.The ADASYN method assigns the number of artificial samples according to the distance between the real samples and the boundary, and the more artificial samples are generated near the real samples close to the boundary.
Conventional data sampling methods include up-sampling and down-sampling, which achieve positive and negative sample equilibrium by copying minority class samples or reducing the number of majority class samples.The data sampling method does not generate artificial data which is not included in real dataset, but it is widely used as an effective sample equalization method in data processing.

Quantification of verisimilitude and diversity
The evaluation of artificial data included verisimilitude and diversity.Verisimilitude ensured the fidelity of artificial data to simulate real data, and the diversity of artificial data enriched the limited distribution range of real samples.Rich diversity of artificial data can expand the coverage of training set in high-dimensional feature space.Extensive sample distribution improves the robustness of the classification model and reduces overfitting.
The verisimilitude is evaluated by the cross-correlation matrix of real samples and artificial samples.Randomly selected 100 real samples and artificial samples, and calculate Pearson correlation coefficient between two kinds of samples.The correlation coefficient was aggregated into 100 * 100 correlation coefficient matrix.The high correlation coefficient between real samples and artificial samples indicated excellent verisimilitude.In addition, the mean value of correlation coefficient matrix described the overall verisimilitude of the artificial sample.
The correlation coefficient matrix within the artificial sample measured the diversity of the generated samples.Randomly selected 200 data from the generated artificial samples and calculated Pearson correlation coefficient between any two artificial samples.The low correlation of generated samples indicates favorable diversity.Furthermore, the average of 100 * 100 artificial samples correlation coefficient matrix measured the overall diversity of artificial samples.P300 is the most obvious EEG component of the target trial activation, and its latency and peak are shown in Fig. 3.Both artificial and real samples showed the same average latency of 0.516 s.The P300 latency was 400-600 ms in most channels.The latency variance of artificial data is less than 0.02 than real data, but the variances of the two data are very close.The P300 peak amplitude of artificial data and real data is similar, but the amplitude of artificial data is lower in most channels.In addition, the P300 peak variance of artificial data is also slightly smaller than real EEG signals.In summary, artificial data have excellent imitation performance, but their peak and variance are slightly smaller than real data, indicating that the diversity of artificial samples still needs to be improved.

ERP analysis
Different from the P300 peak of artificial data lower than the real data, the ERP early component amplitude of artificial data is slightly higher.The artificial data and real data coincide well in Cz and C4 channels, and they both show obvious P1 and P2 components.We speculate that the generator should keep the energy equal between artificial data and real data, resulting in the fluctuation of different components in ERP, and the amplitude difference is approximately complementary.
After P300 component, the real EEG signal presented slow wave with negative voltage, which appeared in the artificial EEG and tended to zero voltage gradually.The artificial EEG signals accurately learn the detail components of 900-1500 ms in most channels.ERP-WGAN can not only emulate the obvious components of real EEG, but also simulate the subtle jitter components in slow wave components.
The standard deviation of artificial data demonstrated that the single-trial EEG enjoyed excellent volatility, which indicated significant differences between different artificial EEG samples.The diversity of artificial EEG samples reduces the risk of overfitting in training set, and approximates the volatility and randomness of real EEG.On the one hand, the artificial EEG signals accurately imitated the real EEG signals, which presented that the mean values of the two signals were approximate in latency, amplitude, components and details.On the other hand, the artificial EEG signal took the real EEG signal as the baseline mode to expand and diverge.The single sample of artificial EEG had rich diversity.R. Zhang et al. occipital and central region to frontal lobe.The amplitude of two kinds of EEG signals increased significantly in frontal lobe.Importantly, P300 component was the most significant component in image target detection, which had a wide range of activation in the central area, parietal lobe and frontal lobe.The latency of P300 component in the experiment was about 500 ms.The artificial and real EEG revealed high amplitude activation almost covering the whole brain functional area.The activation area was symmetrical between the left and right brain, with CZ as the center and decreasing outward.Slow negative potential wave appeared after P300 component.The topographic maps of artificial and real EEG gradually changed from negative potential to zero.In general, artificial and real EEG data demonstrated a high similarity in the activation area and response degree.

Verisimilitude and diversity analysis
The verisimilitude and diversity of the four data generation methods are shown in Fig. 5. Fig. 5 A presents the autocorrelation matrix of four artificial data.ERP-WGAN matrix presented high correlation only in the same sample represented by diagonal.But for SMOTE, Borderline-SMOTE and ADASYN methods, the correlation coefficient near the diagonal of the autocorrelation matrix was higher.This indicated that the adjacent traditional artificial samples had high similarity.Fig. 5 C reveals that the average self-similarity of artificial data generated by ERP-WGAN was one fifth of other traditional methods.Low autocorrelation verified the diversity of ERP-WGAN method.
Fig. 5 B visualizes the cross-correlation matrix between artificial EEG and real EEG.The cross-correlation matrix of ERP-WGAN was significantly higher than three conventional data generation methods.Moreover, ERP-WGAN's cross-correlation matrix distribution was more uniform, which presented that the high verisimilitude of artificial samples was universal.Fig. 5 D manifests the average value of the crosscorrelation matrix.The results revealed that the average verisimilitude of ERP-WGAN was three times that of SMOTE, seven times that of Borderline-SMOTE and ADASYN.In this work, the verisimilitude and diversity of the data augmentation method were comprehensively analyzed.The descending order of artificial data performance was ERP-WGAN, SMOTE, Borderline-SMOTE and ADASYN.

Sample distribution analysis
In the previous section, we analyzed the verisimilitude and diversity of pure artificial data.This section tested the clustering performance of mixed training set of artificial and real data.The results compared the ERP-WGAN method with the commonly imbalance sample processing methods, including three methods of generating artificial data: SMOTE, Borderline-SMOTE, ADASYN, and two sampling methods: oversampling (Cao et al., 2016) and under-sampling (Liu et al., 2009).Over-sampling and under-sampling achieve sample balance by repeating the small-scale samples or sampling the large-scale samples, which easily leads to overfitting of positive samples and inadequate utilize of negative samples.
Fig. 6A visualizes the distribution results of mixed positive samples (mixed with artificial and real data) and negative samples (only real data).The cross of the corresponding color represented the category center, and the radius of the transparent circle meant the dispersion degree.High dimension features were reduced by T-sne clustering method for visualization.ERP-WGAN presented the perfect clustering performance.The Euclidean distance between positive and negative sample centers of ERP-WGAN method revealed its discrimination.In addition, the relatively small radius of transparent circle indicated that the mixed sample had outstanding stability.The other five traditional methods had large overlapping area of positive and negative samples, which worsened the discrimination difficulty of the training set.R. Zhang et al. times of the other five methods.The comprehensive scores of the three conventional data generation methods were only better than the two data sampling methods.The outstanding clustering of ERP-WGAN resulted from its superior performance in discrimination and stability.

Subject-classifier performance
Previous experiments reveal the distribution and quality of samples in the training set, and the final performance of samples is reflected in the classification results.The subject-classifier was tested by five-fold cross validation.LDA and LR classifiers tested the performance of six sample imbalance processing methods.The test set consisted of only real subjects EEG.The evaluation criteria include accuracy, AUC and receiver operating characteristic curve (ROC).
The relationship curve between the features number and accuracy is shown in Fig. 7. PCA method extracted the first 60 main components of 128 features to prevent overfitting.The two classifier results revealed that the accuracy increases with the features number, especially the number of 28-38 features.The LDA classifier accuracy of ERP-WGAN was 5-15% higher than other five methods.The ERP-WGAN method with more than 34 features had the highest discrimination accuracy in LR classifier, as shown in Fig. 7   19 subjects in ERP-WGAN method was significantly higher than that in F-HDCA method by 7% (P = 0.0039 < 0.01).The training set with data augmentation significantly improved the classification performance of most subjects (except subjects No. 4, 9 and 18), and the effectiveness of the method reached 84%.By adopting ERP-WGAN method, the AUC of 6 subjects (subjects No. 2,6,11,13,17,19) increased by more than 10%.The classification performance of 9 excellent subjects reached 90%.

Mixed samples in general classifier
ERP-WGAN method revealed perfect classification results in subjectclassifier test.In this section, we test the general classifier performance of ERP-WGAN method, and analyze different proportions of mixed training set.To study the influence of larger span ratio on classifier performance.We designed 15 mixing ratios scheme, from pure real data to pure artificial data.The general classifier adopted cross-subject data as test set.Fig. 10 manifests the average AUC of 18 subjects.The AUC of the general classifier were slightly lower than subject-classifier, which indicated that the individual differences affected the classification accuracy.
Fig. 10A visualizes the general classification results of low proportion mixed training set.The black curve represents the baseline accuracy of pure real data.The AUC of mixed data with less than 18 features was higher than pure real data.The specific mixture ratio (1:0.5, 1:0.25, 1:0.875) was better than the pure real samples on full-scale features number.Fig. 10 B shows the high proportion of artificial data in mixed training set.The pure real samples and appropriate mixing ratio (real: artificial = 1:1.5,1:2, 1:2.75) enjoy the approximate classification performance.However, the AUC decreased seriously when the number of artificial samples was more than 4 times of real samples.
Mixed samples were beneficial to both precision and sample saving.On the one hand, the advantage of accuracy improvement.Most of the training sets mixed with artificial samples improved the classifier performance.The AUC values of mixed samples were 6-25% higher than pure real samples in 2-5 principal component features (red dotted box), which indicated the classification advantage of mixed samples in low dimension features.For high-dimensional features, the appropriate proportion scheme (real: artificial = 1:0.25,1:0.5, 1:0.875, 1:2, 1:2.75) improved the AUC by 3-5% compared with pure real data.The classification performance increased significantly in the similar scale of artificial and real sample size.The AUC of the optimum proportion (real: artificial = 1:0.5)reached 86.24%.The mixed training set improved the precision in the proportion range of 0.25-2.75.
On the other hand, the proposed method can significantly reduce the EEG samples of real subjects.Consistent results were obtained for pure real and mixed samples (real: artificial = 1:2.75).At the ratio of 2.75, researchers can only collect 27% of the real subjects and adopt ERP-WGAN to generate the remaining 73% of the artificial data.The data augmentation method of ERP-WGAN saves 73% of the subject number,

Structure design of ERP-WGAN
The innovation of ERP-WGAN structure based on EEG characteristics, rather than simply assembling artificial network modules.The randomness and individual differences of EEG data lead to the original GAN training often collapse and convergence difficulties.The artificial EEG data generated by original GAN contained serious glitches, while the pure glitch noise was different from the rhythmic EEG signal.According to the characteristics of low SNR and severe volatility of single trial EEG.ERP-WGAN focued on improving the stability and convergence of the original GAN.The specific methods include: replacing the input Gaussian white noise with resting EEG signal, placing noise reduction module at the generator output, and using wassertein method as the loss function.Fig. 11 compares the artificial data quality of different EEG improvement modules.
EEG resting state signal replaced input Gaussian white noise to prevent network training collapse.In the original GAN training of this experiment, gradient disappearance and EEG fluctuation resulted in network collapse after 7000 iterations.The GAN network with resting state input did not collapse.The resting state input solved the problem of training collapse, and reduced the iterations number of network training and alleviated the glitch noise phenomenon.On the one hand, the resting state signal had the randomness of noise and the correlation with the task state signal.EEG signals were unpredictable but had certain regularity.A great deal of research claimed that the main frequency energy of resting state was concentrated below 100 Hz, and the time domain amplitude was not higher than 100uv.In addition, resting state EEG data was the basic neural state of the brain, which was the baseline of brain task state.Current studies (Wang et al., 2020;Komarov et al., 2020;Brian et al., 2018) proposed that resting state EEG had the close correlation with task state signal amplitude and brain network connection.The task state signal can be predicted by analyzing the  subjects' resting state.On the other hand, resting state EEG included the noise in real experimental environment, the noise of acquisition equipment and the artifact of electromyogram (EMG), while Gaussian white noise lacked prior information of EEG acquisition.The prior information of subjects and environmental noise greatly reduced the difficulty of training GAN.
The noise reduction module was set at the generator output to reduce the noise of the artificial data.In original GAN, fully connected layers lacked correlation learning among adjacent sampling points.The serious glitch noise and frequent fluctuations of artificial data mismatched the rhythm and relevance of real EEG.The main frequency domain energy of ERP task was less than 20 Hz, which restricted the correlation between adjacent points.This work applied a 0.01-20 Hz band-pass filter and smoothing module to reduce high-frequency burr noise and increase the correlation of adjacent points.These noise reduction modules improved the SNR of artificial data significantly.
Wassertein distance increased the convergence speed of training GAN.WGAN is widely applied in the field of image generation and effectively solves the problem of unstable training.Researchers found that WGAN also had excellent convergence effect in one-dimensional time series signals.WGAN achieved complete convergence with only 350 iterations, and the artificial sample quality is better than original GAN with 7000 iterations.The iterations number and training time of ERP-WGAN were reduced by 95%.The above three adaptive improvements for EEG characteristics contributed to the high convergence and stability of ERP-WGAN.

Conclusion
In this work, we proposed a data generation framework based on ERP-WGAN.A variety of modules were designed to improve the convergence of the network for EEG signal with randomness and low signal quality.The framework effectively solved the problem of generating EEG artificial sample in the single-trial condition.The verisimilitude and diversity of artificial samples were evaluated comprehensively.The main contributions of ERP-WGAN method are as follows.Firstly, an improved GAN structure for EEG characteristics was proposed.ERP-WGAN improved the convergence and stability, and reduced the network training time by 95%.Secondly, the imbalance problem of imbalance between positive and negative samples was alleviated by data augmentation approach, and the classification performance was significantly improved in both general and subject-classification performance tests.In addition, we recommended several better mixing ratios of artificial data and real data.Finally, the ERP-WGAN method reduced the subject number by 73% and maintained outstanding classification performance, which greatly shortened the acquisition time and saves the experimental cost.This study has broad application prospects in smallscale lock time EEG data sets, such as identity authentication, target detection and letter spelling.Future research should explore generic GAN networks for time-locked and long-term EEG, and focus on the miniaturization of generation model.

Fig. 3
Fig. 3 presents the artificial EEG data generated by ERP-WGAN.The red curve represents the average real EEG signal when 19 subjects notice the target image.The black curve visualizes the average of 20000 artificial EEG signals in each channel.Not only the artificial EEG samples and the real EEG data were consistent in the overall trend, but also the artificial samples effectively imitated the real EEG in the details.P300 is the most obvious EEG component of the target trial activation, and its latency and peak are shown in Fig.3.Both artificial and real samples showed the same average latency of 0.516 s.The P300 latency was 400-600 ms in most channels.The latency variance of artificial data is less than 0.02 than real data, but the variances of the two data are very close.The P300 peak amplitude of artificial data and real data is similar, but the amplitude of artificial data is lower in most channels.In addition, the P300 peak variance of artificial data is also slightly smaller than real EEG signals.In summary, artificial data have excellent imitation performance, but their peak and variance are slightly smaller than real data, indicating that the diversity of artificial samples still needs to be improved.Different from the P300 peak of artificial data lower than the real data, the ERP early component amplitude of artificial data is slightly higher.The artificial data and real data coincide well in Cz and C4 channels, and they both show obvious P1 and P2 components.We speculate that the generator should keep the energy equal between artificial data and real data, resulting in the fluctuation of different components in ERP, and the amplitude difference is approximately complementary.After P300 component, the real EEG signal presented slow wave with negative voltage, which appeared in the artificial EEG and tended to zero voltage gradually.The artificial EEG signals accurately learn the detail components of 900-1500 ms in most channels.ERP-WGAN can not only emulate the obvious components of real EEG, but also simulate the subtle jitter components in slow wave components.The standard deviation of artificial data demonstrated that the single-trial EEG enjoyed excellent volatility, which indicated significant differences between different artificial EEG samples.The diversity of artificial EEG samples reduces the risk of overfitting in training set, and approximates the volatility and randomness of real EEG.On the one hand, the artificial EEG signals accurately imitated the real EEG signals, which presented that the mean values of the two signals were approximate in latency, amplitude, components and details.On the other hand, the artificial EEG signal took the real EEG signal as the baseline mode to expand and diverge.The single sample of artificial EEG had rich diversity.Fig.4visualizes the learning performance of ERP-WGAN in scalp

Fig. 4 Fig. 2 .
Fig. 3 presents the artificial EEG data generated by ERP-WGAN.The red curve represents the average real EEG signal when 19 subjects notice the target image.The black curve visualizes the average of 20000 artificial EEG signals in each channel.Not only the artificial EEG samples and the real EEG data were consistent in the overall trend, but also the artificial samples effectively imitated the real EEG in the details.P300 is the most obvious EEG component of the target trial activation, and its latency and peak are shown in Fig.3.Both artificial and real samples showed the same average latency of 0.516 s.The P300 latency was 400-600 ms in most channels.The latency variance of artificial data is less than 0.02 than real data, but the variances of the two data are very close.The P300 peak amplitude of artificial data and real data is similar, but the amplitude of artificial data is lower in most channels.In addition, the P300 peak variance of artificial data is also slightly smaller than real EEG signals.In summary, artificial data have excellent imitation performance, but their peak and variance are slightly smaller than real data, indicating that the diversity of artificial samples still needs to be improved.Different from the P300 peak of artificial data lower than the real data, the ERP early component amplitude of artificial data is slightly higher.The artificial data and real data coincide well in Cz and C4 channels, and they both show obvious P1 and P2 components.We speculate that the generator should keep the energy equal between artificial data and real data, resulting in the fluctuation of different components in ERP, and the amplitude difference is approximately complementary.After P300 component, the real EEG signal presented slow wave with negative voltage, which appeared in the artificial EEG and tended to zero voltage gradually.The artificial EEG signals accurately learn the detail components of 900-1500 ms in most channels.ERP-WGAN can not only emulate the obvious components of real EEG, but also simulate the subtle jitter components in slow wave components.The standard deviation of artificial data demonstrated that the single-trial EEG enjoyed excellent volatility, which indicated significant differences between different artificial EEG samples.The diversity of artificial EEG samples reduces the risk of overfitting in training set, and approximates the volatility and randomness of real EEG.On the one hand, the artificial EEG signals accurately imitated the real EEG signals, which presented that the mean values of the two signals were approximate in latency, amplitude, components and details.On the other hand, the artificial EEG signal took the real EEG signal as the baseline mode to expand and diverge.The single sample of artificial EEG had rich diversity.Fig.4visualizes the learning performance of ERP-WGAN in scalp

Fig. 3 .
Fig. 3.The grand average of ERP in main P300 channels.

Fig. 4 .
Fig. 4. The topographical maps result of artificial and real EEG signals were compared.

Fig. 6 BFig. 5 .
Fig. 5.The verisimilitude and diversity of ERP-WGAN are compared with three conventional data generation methods.

Fig. 6 .
Fig. 6.The clustering of ERP-WGAN is compared with three data generation methods and two sampling methods to alleviate data imbalance.

Fig. 7 .
Fig. 7. LDA classifier and LR classifier test the subject-classifier performance of mixed samples.

Fig. 8 .
Fig. 8. LR classifier tests the AUC and ROC curves of six sample imbalance processing methods.

Fig. 9 .
Fig. 9. ERP-WGAN method compares the AUC results of the subject-classifier in the original literature.

Fig. 10 .
Fig. 10.Performance of general classifier based on 15 mixed sample schemes.
R.Zhang et al.