Contextual Imputation with Missing Sequence of EEG Signals Using Generative Adversarial Networks

Missing values are very prevalent in real world; they are caused by various reasons such as user mistakes or device failures. They often cause critical problems especially in medical and healthcare application since they can lead to incorrect diagnosis or even cause system failure. Many of recent imputation techniques have adopted machine learning-based generative methods such as generative adversarial networks (GANs) to deal with missing values in medical data. They are, however, incapable of reproducing realistic time-series signals preserving important latent features such as sleep stages that are important context in many medical applications using electroencephalogram (EEG). In this study, we propose a novel GAN-based technique generating realistic EEG signal sequences which are not only shown natural but also correctly classified with sleep stages by implanting the latent features in the synthetic sequence. By experiments, we confirm that our model generates not only more realistic EEG signals than a recent GAN-based model but also preserve auxiliary information such as sleep stages. Furthermore, we demonstrate that existing machine learning methods based on EEG data still work well without sacrificing performance using the imputed data by using our method.


I. INTRODUCTION
In most of time series data analysis, missing values coming up by various reasons such as user mistakes or device failures lead to performance degradation or even cause system failure. Recent imputation techniques have adopted not only traditional statistical imputation but also machine learning based generative method to deal with missing values. These methods, however, are incapable of generating realistic timeseries signals involving important latent information which is necessary for being exploited in the target application such as sleep disorder diagnosis based on electroencephalogram (EEG).
Having complete dataset in real world is unfortunately almost impossible [1]. Especially, in medicine and healthcare, it was also reported that the majority of records contains a large number of missing values [2], [3]. The failed recording is possibly because of a malfunction of the recording device, lost records or a mistake in electrode attachment [4]. In addition, it is difficult to record complete EEG data, because of the strict requirements of recording environments or for the subjects [5]. Accordingly, most of applications utilizing such medical datasets suffer from missing values so that they may make wrong alerts or incorrect diagnoses [6], [7]. Furthermore, doctors or clinicians may also have trouble scoring sleep stages or diagnose sleep disorders due to such missing data because they need to consider the context through the precedent and following signal sequences according to [8]. To make matters worse, a missing value in such circumstances is usually occurred repeatedly until the cause is removed since recording lasts long time without frequent monitoring. Thus, existing imputing methods cannot handle such cases effectively even if they can reconstruct for a single or short-term missing values by interpolation based on adjacent non-missing values.
Our contribution: In this paper, we develop a novel deep VOLUME 4, 2016 neural network-based technique to complete the missing EEG signals which not only look natural, but also preserve contextual information well that is significant for the analysis of the data. In detail, we assume that our dataset includes the sequences of EEG signals and annotations of sleep stages periodically labelled in the sequences. The sleep stage is a category of 5 types indicating REM, Sleeping 1∼3 and Wake, which is annotated by expertise aiming at diagnosis of epileptic, neurological and sleep disorder, measurement of mental health conditions, and psycho-physiological research [9]. Hence, preserving such characteristics in the created EEG signals is an important goal in our work, and there does not exist a technique completing missing values with the consideration of the contextual features to our best knowledge.
To generate realistic EEG signals to replace missing values, we adopt a generative adversarial networks (GAN). Due to its successful applications in image generation, GAN has been widely used in the imputation for image and time series data as well. Image inpainting which fills missing pixels of a picture also has achieved remarkable performance by using GANs [10]- [13], however, all the techniques are based on the assumption that they are available with complete dataset, that is, image without missing parts. Furthermore, a recent work [14] utilizes GAN to generate fake EEG signals for data augmentation but the model does not consider any context at all. Therefore, we adopt GAN for not only EEG signal generation but also acquiring contextual information for data augmentation. The contribution of our work can be summarized as follows: • We suggest a novel GAN-based technique to generate synthetic EEG signals which looks realistic as well as retains important features in the medical context called sleep stage. • In experiments, we confirm that our proposed model generates realistic EEG signals by showing the similarity between real and fake ones in both time and frequency domains as well as evaluating the accuracy of classifying sleep stages of generated signals. Furthermore, we also show that applications based on EEG data still work well without sacrificing performance using the imputed data by replacing missing signals with synthesized ones.
We have evaluated our generative model mainly focused on EEG signals with sleep stage labels in this work, but the model can be easily extended to learn and generate any time series data such as electrocardiogram (ECG) signals with types of arrhythmia for heart disease diagnosis.

II. RELATED WORK
Imputation of missing values: Missing values in a dataset largely can be tackled with two strategies; the inherent consideration of missing values by developing robust models or algorithm which is not affected by missing values and the explicit modification of dataset by imputation or deletion to obtain a complete dataset without missing values. Deletion, which simply removes all records or even columns which including any missing values, and simple statistical imputation, which replaces missing values with mean, median or the most frequent value, have been used traditionally [2]. However, deletions may lead to lose too much information in a dataset and simple imputations usually fail to produce plausible data that look realistic considering context. Moreover, the first strategy to develop a robust model handling missing values inherently has a limitation that we have to devise a separate technique for every application with the dataset.
Recent development in machine learning techniques has enabled us to replace missing values with realistically generated one. The imputation methods for time series in [15] and user-rated movie scores in [16] formulated their data as a user-item matrix and utilized matrix factorization to estimate the missing entries. As early recurrent neural networks had been used for imputing missing values in time series data in [17] and recent generative models such as auto encoder and generative adversarial network also exploited to deal with missing values; for example, both models are extended to guess missing entries in tabular datasets in [1], [18] and especially, GANs were adopted extensively for the application called image inpainting, which completes missing part of an image, in [10]- [13], [19].In [20], an imputation technique for time-series sensor data was developed using the sequence-to-sequence network, which is suitable for discrete time-series data. Moreover, [21] and [22] acquire auto encoder and transformer to reconstruct missing regions in input frames respectively.
These state-of-the-art techniques have shown a remarkable performance, however, they require complete data sets for training. In fact, most of dataset obtained in real world inevitably suffered from missing values such as EEG signals collected from medical devices, seismic signals from distributed sensors and many other observation in nature. Image inpainting works well using GANs since we can obtain images without missing values without difficulties. These works also are adopting GAN architectures like as we are, but based on the assumption that complete instances are available for training.
Generative adversarial network (GAN): As mentioned before, we adopted GAN for generate the missing values in this paper. GAN is a framework introduced by [23]. It trains a generator G and a discriminator D together; G generally maps a noise signal to a fake sample which is indistinguishable from real images while D classifies between real and fake samples. GANs have been approved to be effective in various problems and domains such as image generation [24], [25] , image translation [26], [27] , audio synthesis [28], [29] , image resolution enhancement [30], [31] , image de-noising [32], [33] and feature generation [34].
When certain labels for context such as categories for any purpose are available in training data set, we can vary GAN to exploit such information to generate more realistic instances using auxiliary classifier [35]. As auxiliary classifier learns to classify the additional label correctly, generator is trained to generate samples holding characteristics which look naturally when considering the labels. We also adopt the auxiliary classifier to generate realistic EEG signal considering sleep stages available in the data set. Moreover, as several optimization techniques such as Wasserstein GAN (WGAN) [36] and gradient penalty [37] have been developed due to its inborn instability in training [38], we also exploit these techniques in our model to stabilize training process of GAN.
Machine learning for EEG sequence generation: In medicine and healthcare, machine learning has also been widely adopted to complete the missing values [2], [39]. While these works had been based on traditional approaches such as PCA/ICA, multilayer perceptron, random forest and SVM, GAN is also being hired recently in many applications of the area. For example, it is used for generating synthetic EEG signals in [14], [40], [41] in the purpose of data augmentation. Other traditional techniques, SMOTE [42] and ADASYN [43] which had been proposed to alleviate class imbalance, have also been exploited to enhance an automated classifier of EEG signals [44]. However, those techniques are not adequate to generate realistic EEG signals. Another variation of GAN was introduced in [45] to up-sample EEG signal with a higher resolution. In addition, [5] discovered that DCGAN outperforms the traditional methods to augment EEG signal such as geometric transform, auto encoder and variational auto encoder.
These works have shown to successfully generate EEG signal and be exploited for improving classification. However, they neither consider missing values of EEG sequence for training nor create realistic and practically useful samples exhibiting contextual features such as sleep stages.
In spite of recurrent neural network (RNN) based model is capable for handling sequences, running long short-term memory (LSTM) networks on a long sequence fails such as 480 points of a sequence [46]. Because a sequence of EEG signal is usually equal to and more than 3000 points, we do not consider RNN as well as LSTM based model.

III. SIG-GAN: GENERATIVE ADVERSARIAL NETWORKS FOR SIGNAL SEQUENCES
In this section, we define notations for describing data and models and then, introduce our proposed SIG-GAN, a GANbased network for imputing missing signal sequence in EEG data.
Data description: Let S = {S 1 , . . . , S n } be a collection of n sequences of signal segments where S i denotes a sequence {s i1 , · · · , s iti } of t i signal segments s ij (i ∈ [1, n] and j ∈ [1, t i ]). Each sequence of signal segments S i contains EEG signals, (e.g., 3, 000 times recorded signal during 30 seconds with 100 Hz sampling frequency), collected through a medical test called polysomnography (PSG) study using electronic devices, and has a label c i which is one of 5 types of sleep stages annotated by medical doctors or technologist. The notation c i is used interchangeably to denote the sleep stage name or indicate a one-hot encoded vector in this paper. Note that a signal segment in the dataset may be a segment with missing values as our assumption.
To tell the process of data collection in PSG test briefly, it is performed overnight with a patient while the body functions of the subject such as brain activity (EEG), eye movement (EOG) and heart rhythm (ECG) are continuously recorded, i.e., those signals represent the electrical activity for each organ. Furthermore, as the technicians monitor the subject, they periodically score a -usually, 30 seconds long -signal segment as one of 5 sleep stages, which represent the stages of sleep cycles including W, N1, N2, N3 and R, following scoring manuals such as the American Academy of Sleep Medicine (AASM) [8]. Accordingly, each sequence S i corresponds to a sequence of 30-second long signal segments collected through a night from a patient.
Motivation and problem definition: In a PSG test, occasionally recording failure can be caused by various reason such as the malfunction of the electrodes. Since the recoding error can last long time until its cause is corrected by technicians, the failure may result in a long sequence of missing signals over several segments. According to the sleep stage scoring manual in [8], a practitioner needs to consider the context through the precedent and following signal segments to determine sleep stages of given segments. For example, let a precedent sequence S 1 of 30 seconds signal segments is annotated as stage N1 and it contains K complex which is a strong evidence of stage N2 at the last 10 seconds. Then, the following sequence S 2 is scored as stage N2 unless there is evidence of shifting to another sleep stage [8]. As stated by the manual, we assume the contextual information can be preserved during a sequence just after the precedent one at least. Therefore, we consider the case that a sequence (30 seconds long) can be lost given a precedent one. Furthermore, computer aided diagnosis based on EEG signals also depends on the context of sequences for decision making. Thus, such missing segments may cause critical failure in diagnosis.
Hence, we suggest a generative method based on GAN for imputing missing signal segments which creates fake EEG signal segmentations that look natural as well as preserve contextual information like sleep stages. Fig. 1 illustrates the architecture of our proposed network. To trace the context changing along the EEG signal segmentation, we adopt a generator G in the manner of auto-encoder. It takes a signal segment as input and generate a segment which can be expected to follow the next. The discriminator determines if the input segmentation is fake or real. The auxiliary classifier C infers the sleep stage of a given signal segmentation as AC-GAN does in [35], which is proven that it stabilizes training well so that the output of G follow the real input distribution. Naturally, C and D share the convolutional layers since they should utilize common local features for their own decisions in each downstream network.  For training, we define three types of losses and select training samples for each loss as follows: • Adversarial loss: It leads G to output a realistic fake signal segment given a preceding segment as input while D to distinguish between real and fake segments. Computing this loss requires a single signal segment and the training set S i of segments sampled from S is referred to as S adv . • Reconstruction loss: This is for fitting G to imitate the following signal segments as much as possible.
To calculate the loss, we sample a set S rec of pairs ⟨S i−1 , S i ⟩ of adjacent two non-missing signal segments from S. • Prediction loss: It enables G to be aware of the contextual information of sleep stage c i as well as shepherds C and D to catch the real data distribution stably. The training datasets for computing the loss are split into two cases: S pred,0 = {⟨S i−1 , c i ⟩} when computing the loss and learn the parameters for G with a fake segment G(S i−1 ) which estimates S i , and S pred,1 = {⟨S i , c i ⟩} for training D with a real non-missing segment S i .
Adversarial loss: To compute the adversarial loss of G and D, the training set S adv of segment samples S i ∈ S is utilized. By adopting the adversarial loss of WGAN [36], we can formulate its corresponding optimization problem as below where θ G and θ D are trainable parameters of generator G and discriminator D respectively. While D is trained to tell the observed segment S i as real and forged segment G(S i−1 ) as fake, G becomes to output G(S i−1 ) which deceives D to answer it is real by minimizing the second term in Eqn. (1). Furthermore, since adversarial loss suffers from unstable training [37], we add the gradient penalty Eqn.
(1), whereŜ is a sampled segment from linear interpolated distribution P(Ŝ) between the real and generated data [36].
Reconstruction loss: To obtain G which imitates the next signal segments given a sample segment, we impose the reconstruction loss which is defined as L 1 distance between a sample segment S i in S rec and G(S i−1 ). Its related optimization can be shown as following Moreoever, we inject reparameterization trick as suggested in [47]. Therefore, our model maps the distribution of signal generation into Gaussian distribution with mean µ curr and variance σ curr as shown in Fig. 1. We omit the part of reparameterization trick in Eq.2 for readability.
Prediction loss: To achieve our goal that G outputs a fake signal whose sleep stage is correctly recognized, we exploit the auxiliary classifier C as [35]. With each sample S i−1 from a sample set S pred,0 = {⟨S i−1 , c i ⟩}, we define the prediction loss so that G learns to generate G(S i−1 ) whose desired sleep stage is c i as follows where C(G(S i−1 )) is the sleep stage predicted by C with the input segment G(S i−1 ), and L(·, ·) indicates the cross entropy between two distributions. Furthermore, the prediction loss is also utilized for training C with samples S pred,1 = {⟨S i , c i ⟩} to predict the correct sleep stage with a real signals S i as the following: where θ c is the learnable parameters for C.
Overall objective: Our full optimization problem is where λ 1 , λ 2 and λ 3 are the hyper-parameters to control the relative importance of gradient penalty, the reconstruction and prediction losses respectively.

B. FORMULATION FOR TESTING PHASE
With missing segment S i whose preceding one S i−1 is present in S, G(S i−1 ) simply is used to impute the missing segment. In a real application to use our method, however, detecting missing segments from a sequence of signals is another issue. Fortunately, we can simply utilize the discriminator D to find the missing segments. In our evaluation, we find that D often fails to detect missing intervals if the signals definitely do not look like EEG at all, for instance, simple uniform values or random values. Thus, we additionally trained D to detect such cases as missing segments with synthetically generated non-EEG signals.

IV. IMPLEMENTATION DETAILS
We implemented SIG-GAN using the machine learning framework TensorFlow [48].
Network architecture: For the encoder of generator G, we borrow the architecture of DeepSleepNet [49] which employs two sequences of convolutional layers in parallel. As shown in Fig. 1, the encoder of G takes the input signal through two different 1-dimensional convolutional neural networks  Enc short and Enc long whose filter sizes are 11 and 101 respectively. This architecture aims to capture the features that appear with high and low frequencies adaptively. Each segment of signal involves 3, 000 EEG signals since the sequence is split by every 30 seconds where the data was sampled at 100Hz.
The detailed structure of SIG-GAN is shown on Table 1  and Table 2. A signal segment with 3, 000 dimensional vector is fed into the encoder and we set same padding for the dimension of time-axis to keep 3, 000 dimensionality while the channel-axis is decreased from 32, 16 and 8 in both Enc short and Enc long , i.e., the encoder has 64, 32 and 16 channels at each layer. The outputs of each encoder Enc short and Enc long are then concatenated along channel-axis. After that, we exploit reparameterization trick which is proposed by [47]. Then, we put two layers of transpose CNNs for the decoder.
The discriminator D and auxiliary network C are simply stacks of convolutional layers.They share the first three layers whose channel-axis are varied from 8 to 32. Then, D and C consist of fully-connected layer with a single and five output nodes respectively. Remind that D is for judging a segment real or fake and C classifies a segment into the five sleep stages. We leverage ReLU activation function [50] in the encoder to allow the model to learn fast, and we adopt LeakyReLU activation function [51] to force the decoder to generate high quality of the signal [24]. Moreover, we adopt batch normalization [52] with every layer to relieve the problem of poor initialization [24].
The numbers in parentheses in Table 1 and Table 2 are the number of filters, filter size and strides, e.g., (32,11,1) in the second row means that the layer is structured with 32 filters, 11 sized filter and 1 stride. In this section, we empirically evaluate the performance of our proposed networks. We implement all deep neural networks using TensorFlow 2 on python 3.7. All experiments reported in this section are performed on the machines with Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz and 128GB of main memory running Ubuntu 16 OS. We also utilize a single GPU card NVIDIA GeForce GTX 1080 Ti equipped with 11GB of memory.
Training details: We utilize Adam optimizer [53] and set the batch size and learning rate to 16 and 0.0001 respectively. We empirically select the weights for gradient penalty, reconstruction and prediction losses in Eqn. 5 as: λ 1 = 10, λ 2 = 100 and λ 3 = 1. Fig. 2 shows the performance evaluation with varying the loss weights; we tested the sleep stage classification with DeepSleepNet [49] using a dataset with 12% missing values. The graph shows the accuracy of classification with varying λ 1 from 0.1 to 1, 000, λ 2 from 1 to 10, 000 and λ 3 from 0.01 to 100 respectively. The performance does not differ much with varying the weights and we determined the default setting accordingly.

A. IMPLEMENTED ALGORITHMS
For comparative performance evaluation, we implemented three models that impute missing data by generating synthetic EEG signals as follows: • RANDOM: This method imputes missing signals with randomly sampled signals whose values are between -1 and 1. There are three strategies for sampling signals: i) sampling a constant value repeatedly, ii) sampling with a linear function, and iii) independently sampling random value following a uniform distribution. To sample a constant value, it selects a random number in range [-1, 1], and replace all missing signals with the selected value. Linear signal sampling is to substitute the missing part with a line whose slope and intercept are randomly determined. Finally, the last method samples random values in range of [-1, 1] following a uniform distribution independently and identically as many times as the number of missing signals.
In our experiments, we tested all three strategies for evaluation but they show similar performance, and thus provide the result by using the last method for RANDOM.
• EEGGAN: It is a GAN-based model presented in [14] to synthetically generate EEG signals. Evaluation tools for generated EEG signals: Note that the purpose of our technique is to replace missing EEG signals with synthetically generated ones so that medical software or devices dealing with EEG signals operate normally without sacrificing performance much. Thus, in our experiments, we assume automatic sleep stage scoring for such application and utilize two deep learning-based classifiers, which are DeepSleepNet [49] and SleepEEGNet [55]. These classifiers and the GAN-based EEG signal generators, SIGGAN and EEGGAN, are trained separately and their synthetic signals are input to the classifiers to test if they still work well. We implement both classifiers in TensorFlow with hyperparameters presented in each paper.
• DeepSleepNet [49]: It is a classifier that determines the sleep stage with 30-second long EEG signal. It includes two sequences of convolutional layers with differentsize filters, similar to our SIG-GAN model, for feature recognition. To classify by considering preceding and following signals, it also adopts the bi-directional long short-term memory (LSTM) network. • SleepEEGNet [55]: This is another classifier that scores sleep stages, whose architecture is similar to that of DeepSleepNet, but it adopts sequential encoder-decoder structures using bi-directional LSTM and employs attention mechanism as well.

B. DATASET
For the training and test dataset, we downloaded a publicly available Sleep-EDF database [56], which includes 153 recordings obtained from 44 healthy people and 22 patients who had mild difficulty falling asleep. Each recording has as long as about 8 hours and sampled with 100 Hz frequency. Furthermore, the sleep stage of every 30-second long segment is scored manually by well trained practitioners according to R&K rule [57] and AASM [8] which categorizes a segment into 5 classes (W, N1, N2, N3, REM). For preprocessing, we normalize the data to be ranged in [−1, 1].
In training phase, since our SIG-GAN takes two segments each of which is sampled for 30 seconds as inputs, we  [58]; the data is split into training, validation and test sets, which are 50%, 25% and 25% respectively. Moreover, in our training set, the sleep stage ratio of segments in W, N1, N2, N3 and REM are 21%, 6%, 41%, 14% and 17% respectively. To mitigate the problem of suffering from class imbalance, we oversampled minor classes to balance their ratio as other works handling EEG data did [49], [59], [60]. For all performance evaluation, we repeated Monte Carlo cross-validation 10 times and average the quality measures to show the performance.

C. REPRODUCIBILITY OF EEG SIGNAL FEATURES
We first test if the implemented EEG signal generators reproduce the realistic ones similarly to the real EEG signals. For case study, we plot some selected real and synthetic time series signals as well as spectrograms showing them in the time-frequency domain. To figure out that the generated signals show similar distributions in their frequencies, we examine the energy density over frequency by using band pass filters.
Signals in time series: In Fig. 3, we plotted some randomly selected samples for both targeted and generated EEG signals. To see if we can obtain the signals using SIGGAN showing the characteristics of sleep stages appropriately, we show real and synthetic signals labelled as W and N2. To select the 30-second segments of SIGGAN for W and N2, we sampled pairs of segments classified as the corresponding sleep stage by DeepSleepNet. Due to the limit of space, we show only two pairs of samples but we can see in the figures that the generated signal is quite similar to the real one and they also mimic distinguishable features in EEG signals according to the sleep stages.
Quality and diversity: Evaluation of the performance of generative models is open problem. To evaluate quality and diversity of generated signal, we measure inception score (IS) [61] and Frechet inception distance (FID) [62]. IS measures the quality and diversity based on computing KL-divergence between the class distribution given generated sample and marginal class distribution using the pre-trained inception networks [63]. IS is formulated as IS = exp E x D KL p(y|x)||p(y) where x is generated sample by a GAN model and y is predicted class by the pretrained inception model. Since we can not directly exploit pre-trained inception networks using ImageNet dataset, we train only sub-part of DeepSleepNet which consists of convolutional neural networks instead. Similarly, various GAN  based works apply well known classifiers rather than inception networks to evaluate IS and FID for each domain such as audio synthesis [64], [65] and EEG signal generation [14]. We calculate and compare our IS with real signal, RANDOM and EEGGAN at second column in Table 3. At the first row, the score 2.53 indicates upper bound of IS. The results prove that our method is comparably higher than baselines. It indicates that generated signal by SIGGAN has better quality and diversity than baselines.
Although IS has been the first popular metric to evaluate GANs, it does not utilize any ground truth samples. Therefore, FID [62] has been proposed to capture the similarity of generated samples to real ones. FID embeds generated samples and real ones into feature space respectively using the pre-trained inception network. Then the embedded features are estimated as continuous multivariate Gaussian. FID measures distance between those two Gaussian distributions using formula such as F ID(r, where (µ r , Σ r ) and (µ x , Σ x ) are the mean and covariance of the samples of real ones r and generated ones x respectively. In Table 3, we produce FID between real signal and others. Since real signal has zero distance between itself, FID of real signal is 0 and it is the lower bound of FID. The results show that generated signal using ours is the closest distance whereas FID of RANDOM and EEGGAN are way more higher. It proves that the suggested model generates the most similar signal to real signal.
Time-frequency representation: By simply plotting the  Fig. 4. In Fig. 4(a) and Fig. 4(b), we plotted the real (left) and generated (right) signals of sleep stage N2 and N3 respectively. The TFRs also show that the generated signals closely resemble the real EEG signals.
In addition, we also select two segments by EEGGAN randomly and show their TFRs in Fig. 5. Note that because EEGGAN does not take any auxiliary inputs such as sleep stages, we can not choose the segments with a specific sleep stage. We discover that almost TFRs of the signals by EEGGAN show the similar patterns shown in Fig. 5 and they do not preserve the features of real EEG signals as shown in Fig. 4.
Band pass filter: In Fig. 6, we plotted the magnitude distribution over frequency domain with the real and synthetic signals generated by SIGGAN. To see if the generated ones reproduce the contextual features of EEG signals shown in frequency domain, we calculated the distribution with three sleep stages W, N3 and REM separately. For the analysis, we utilized 10 band-pass filters whose bands of frequencies are ranged from 0Hz to 30Hz by interval of 2Hz. According to AASM sleep stage scoring manual in [8] and the research on brain waves about sleep stages [66], it is known that EEG signals labelled W show relatively high magnitude in high frequencies including alpha (8-12Hz) and beta (12-VOLUME 4, 2016 30Hz) waves while those are recorded in N3 and REM. It also states that when people fall in deep sleep, the signals in low frequencies become stronger than other frequencies.
The graphs in the first column of Fig. 6 show that the average and maximum magnitude in each band has a peak in high freqencies at about 16-22Hz in both the real and synthetic EEG signals similarly. Furthermore, as known as that REM shows typically large amounts of theta (4-8Hz), the graphs in the last column of Fig. 6 also confirm that the signals generated by our SIGGAN look very realistic similarly to the real world EEG signals. Most of distribution with fake signals generated by EEGGAN, however, shows similar one to the graph in Fig. 7 without varying in sleep stages.

D. EVALUATION BY SLEEP STAGE SCORING
Our SIGGAN model aims to impute missing EEG signals with the realistic synthetic data generated based on the preceding signals so that any devices and software utilizing EEG signals measured by polysomnography (PSG) study. Thus, it is desirable that the segment generated to impute the missing part preserves the correct sleep stage score if it was correctly measured without failure. In our experiment, we test if the generated signals are correctly classified to preserve the quality of data in EEG applications.
Classification with individual segments: To evaluate the performance in EEG applications, we sampled a test dataset by sampling 5,762 pairs of adjacent 30-second segmentations as ground truth, and tested if the applications such as DeepSleepNet and SleepEEGNet can identify the sleep stages of the generated EEG segments, which are output by SIGGAN based on the first 30-second signals, correctly as the same labels as the ground truth segments have. Note that we selected the test dataset to be evenly distributed over sleep stages.
Overall, DeepSleepNet classifies the signals generated by SIGGAN into its correct sleep stage scores with 65.67% of accuracy in average while it is 82.85% for real signals as shown in Table 4. In Fig. 8, we depict the confusion matrices that DeepSleepNet produces with the test dataset. In each matrix, a row represents the ratio of each sleep stage which are classified into each sleep stages by DeepSleepNet. The results show that DeepSleepNet achieves about 80% of accuracy with the ground truth dataset as shown in Fig. 8(a). Fig. 8(a) indicates us that since the real EEG signals of N1 and R stages look similar (e.g., they typically have a large amount of low frequency such as alpha waves), Deep-  SleepNet often confuses them as demonstrated in the second and last rows. Similarly, for the synthetic EEG signals by SIGGAN, the classification tends to be wrong with the stage R as shown in Fig. 8(b), but it still obtains reasonably high accuracy for the signals of N1 and R. For sleep stages N2 and N3 which are characterized by slow frequency and high amplitude such as delta waves, we can find that the classifier shows quite low accuracy with real EEG signals, and hence becomes to confuse the segments of N2 and N3 with the generated signals as shown in the third row of the confusion matrix in Fig. 8(b). The reason why the missing part of N3 stage is largely misclassified into N2 is that in the original training data used for fitting our GAN model, the signals of N3, which represents the deepest stage of sleep, was not long enough. Thus, SIGGAN tends to generate signals of N2 instead of N3.
Classification with EEG signal sequences: With varying the ratio of missing data in the signal sequences(missing from 0% to 50%), we tested the accuracy of sleep stage scoring by DeepSleepNet and SleepEEGNet and plotted the results in Fig. 9. We evaluated the performance with three EEG signal generators RANDOM, EEGGAN and SIGGAN.  For RANDOM and EEGGAN, we cannot force them to consider the preceding signals for imputing the missing data with considering the context, the missing segments were replaced with the one generated by them independently and individually. For SIGGAN, we generated segments for the missing ranges by providing the previous measured EEG signals. Since the segments are randomly dropped each the selected probability when we create the test dataset, we simply generate the next segment using the previous one which is also an output of SIGGAN if missing segments are located continuously. Both graphs confirm that SIGGAN outperforms significantly the other methods with all ranges of missing data ratios; DeepSleepNet does not suffer from performance degradation much using ours, even we remove the segments by 48% where it shows 75.75% of accuracy. Note that the classifier achieves the accuracy of 82.94% with the ground truth dataset. For the other application SleepEEG-Net, the accuracy of the algorithm with imputed signals using SIGGAN is only decreased by 1.47% for 50% of missing data ratio.
Discovering effects of each component: We have designed experiments to explore effect of each component in our model. Hence, we train SIGGAN without each component and evaluate the performance as we have done in the previous section Classification with individual segments. FIGURE 10: Comparison between target signal and without reconstruction loss. The first row shows the target signal. The second row represents generated signal without reconstruction loss. Without reconstruction loss, the generated signal includes high frequency noise.
We first train SIGGGAN without adversarial loss which are GAN loss (Eqn. 5) and gradient penalty loss (Eqn. 6). In Table 4, the result shows that the model achieves only 45.05% accuracy without adversarial loss. Although the model still acquires reconstruction loss and prediction loss, it does not properly learn and generate signals. It means that GANs are the essential part of the model. Moreover, we investigate how the auxiliary classifier affect the model performance in Table  4. The results shows that our model with auxiliary classifier is better than without it. In addition, we reveal the contribution of each loss function in Table 4 such as gp-loss and prediction loss. Without each loss function, the generated signal does not work well with a classifier as proposed one. Without gploss, the accuracy is 62.61%, and without prediction loss, the accuracy is only 24.29%. It shows that for each component in the model and loss function are required to perform the best result. Moreover, we train the model without reconstruction loss and evaluate the performance. The accuracy is 71.52% as we can see in Table 4. It is the best classification performance. However, our goal is to generate realistic signal whereas the generated one without reconstruction loss is not as shown in Fig 10. Generated signals contain high frequency noise. We analyze that the reconstruction loss boosts the model to generate signals to be realistic as target signals, however without the reconstruction loss, the model focuses on prediction loss. Consequently, the model produces high accuracy but unrealistic signals without reconstruction loss.

VI. CONCLUSION
In this paper, we developed SIG-GAN, a GAN-based deep neural network to impute missing data in the sequences of EEG signals. The proposed model was devised to acquire the context from precedent signals and create realistic signals using auxiliary labels such as sleep stages. In the experiments, we validate that the proposed model not only VOLUME 4, 2016 generates realistic EEG signal compared with real signal but also the produced signals reproduce sleep stages, which is an important characteristic of EEG signals, better than the recent GAN-based model. Using existing automatic sleep stage scoring models, we demonstrate that the models still work correctly with the imputed dataset, and our model generates missing EEG signals realistically in terms of sleep stages.