fNIRS-GANs: data augmentation using generative adversarial networks for classifying motor tasks from functional near-infrared spectroscopy

Objective. Functional near-infrared spectroscopy (fNIRS) is expected to be applied to brain–computer interface (BCI) technologies. Since lengthy fNIRS measurements are uncomfortable for participants, it is difficult to obtain enough data to train classification models; hence, the fNIRS-BCI accuracy decreases. Approach. In this study, to improve the fNIRS-BCI accuracy, we examined an fNIRS data augmentation method using Wasserstein generative adversarial networks (WGANs). Using fNIRS data during hand-grasping tasks, we evaluated whether the proposed data augmentation method could generate artificial fNIRS data and improve the classification performance using support vector machines and simple neural networks. Main results. Trial-averaged temporal profiles of WGAN-generated fNIRS data were similar to those of the measured data except that they contained an extra noise component. By augmenting the generated data to training data, the accuracies for classifying four different task types were improved irrespective of the classifiers. Significance. This result suggests that the artificial fNIRS data generated by the proposed data augmentation method is useful for improving BCI performance.


Introduction
Functional near-infrared spectroscopy (fNIRS) is a neuroimaging technique that measures human brain activity in a non-invasive manner. Using near-infrared lights, fNIRS detects relative concentration changes in oxygenated and deoxygenated hemoglobin (ΔOxy-Hb and ΔDeoxy-Hb, respectively) in the cerebral cortex [1,2]. Compared with other neuroimaging methods, fNIRS has the advantages of being portable and easyto-use and is thus used in many areas, including the monitoring of cognitive ability and neonatal brain function, daily life situations, and rehabilitation. One of the applications of fNIRS is brain-computer interfaces (BCIs). BCIs are technologies that enable humans to operate external devices using their brain activity observed with electroencephalograms (EEG) or electrocorticograms as well as fNIRS, which measures hemodynamic activity [3]. Several studies have demonstrated that BCIs can utilize fNIRS [3][4][5][6][7][8][9]. Nevertheless, higher classification accuracy is desirable for reliable and useful applications.
A common approach to improve BCI accuracy is to find a better classification model. Therefore, previous studies have tested several classification algorithms and developed methods to achieve higher classification accuracy [4][5][6]. While these previous studies focused on algorithms, the number of datasets is also an important aspect for achieving high classification accuracy. In general, BCI classification models must be trained using a large amount of data measured by neuroimaging, e.g. fNIRS; however, collecting such large amounts of data is usually difficult, because neuroimaging data measurements are lengthy and burdensome for participants even though fNIRS is easy to use. Therefore, only a limited amount of data can be fNIRS-GANs: data augmentation using generative adversarial networks for classifying motor tasks from functional near-infrared spectroscopy acquired. As a result, classification models cannot be sufficiently trained, which leads to reduced BCI accuracy.
One solution to this problem is data augmentation. Data augmentation is a common approach used in the field of image recognition, where simple manipulations of the original data such as flipping, changing color, and adding noise are known to be effective [10]. Similarly, in BCI research using EEG, several augmentation methods have been proposed to improve the accuracy or reduce the calibration time [11][12][13][14][15].
Here, we aimed to improve fNIRS-BCI accuracy by augmenting fNIRS data using generative models. Generative models that use neural networks, e.g. generative adversarial networks (GANs) [16], are being actively applied to a wide range of computer science research, especially image generation [16][17][18]. In medical imaging analysis, one previous study succeeded in improving liver detection using GANs [19]. Several data augmentation methods using GANs have also been tested for neuroimaging using EEG [20][21][22][23]. A recent study also applied GANs to fNIRS simulation data [24]. However, GAN training is unstable, and the applicability of GANs for real fNIRS data has not been tested yet.
In this study, we propose a fNIRS data augmentation method using Wasserstein GANs (WGANs) [25]. A WGAN is a variant of a GAN and consists of a generator and a critic ( figure 1(a)). The generator provides synthetic fNIRS generated data from the random latent variable z so as to become similar to the measured (original) fNIRS data. The critic evaluates the generated data and measures the original data. By repeating the training of each weight for the generator and the critic, the generator can generate similar data to the original data (for details, see section 2.1). This method has been shown to be useful for data augmentation using EEG [20,21]. Here, we applied fNIRS-GANs to the measured fNIRS datasets [8,9]. Specifically, the GAN generator was trained using measured datasets (Training process, figure 1(a)), and then we tested whether our proposed data augmentation method could generate artificial fNIRS measurement data and thus improve classification accuracy.

Augmentation using GANs
A GAN [16] generates artificial data by training two neural networks, the discriminator D(·)and the generator 1 G(·), with an adversarial procedure. The GAN value function V(D, G) is expressed as: where x is the training data in the discriminator, z is the latent variable that represents the input noise in the generator, and E [·] denotes expectation function. In GANs, the discriminator is trained by maximizing V(D, G) and the generator is trained by minimizing V(D, G).
The discriminator, with the parameter θ D , is updated by increasing its gradient, g D , which is calculated using the training data {x (1) , …, x (N) } and latent variable {z (1) , …, z (N) }: where N is the number of training data samples. On the other hand, the generator, with the parameter θ G , is updated by decreasing its gradient, g G , which is calculated using the noise {z (1) , …, z (N) }: In other words, the discriminator is trained to maximize the probability of assigning the correct label to both the training data and the generated data. In contrast, the generator is trained to minimize the probability of assigning the correct label to the generated data. By updating θ D several times before updating θ G , minimizing the value function (1) corresponds to minimizing the Jensen-Shannon divergence. However, the problem is that g G vanishes because the Jensen-Shannon divergence is not continuous with respect to θ G . This could stop generator learning. Also, GANs present the problem of mode collapse, as the generator is likely to produce the most frequent value in the training.

Wasserstein GANs
WGANs have been proposed to solve the problem of gradient vanishing [25]. Instead of the Jensen-Shannon divergence, the WGAN minimizes the Wasserstein distance that is continuous and differentiable with respect to θ G . Also, the WGANs prevent mode collapse. In the WGAN, the discriminator D is referred to as the critic because it does not discriminate the data. The value function V(D, G) is expressed as: The critic, with the parameter θ D , is updated by increasing its gradient, g D , which is calculated using the training data {x (1) , …, x (N) } and latent variable {z (1) , …, z (N) }: The generator is updated by decreasing g G : Here, the discriminator is required to take θ D within a small range of [−c c] due to the Lipschitz constraint.
The learning of WGANs is slow compared with that of GANs. Therefore, Gulrajani et al proposed a method that introduces a penalty term (referred to as gradient penalty, GP) to constrain the gradient norm with respect to the output of the critic [26]. By incorporating GP, the value function (4) is: and the gradient of θ D is written as follows: Here, and λ = 10 is the gradient penalty coefficient and is a random variable that follows uniform distribution U(0, 1). We used WGANs that incorporate the penalty (WGAN-GP) [26] to augment the fNIRS data.

The proposed fNIRS data augmentation method using GANs
In this study, both the discriminator and the critic in the WGANs were configured as fully connected After the training process, the data generated by the generator (referred to as generated data) are combined with the original fNIRS data as augmented data. Hidden feedforward neural networks with three layers (figure 1). Time series data in all fNIRS channels were given as input vectors x to the critic. Random numbers sampled from a uniform distribution U(−1, 1) represented input to the generator z that was a vector with N z dimension (=100). WGANs were constructed for each task condition. In other words, multiple WGANs were used to generate synthetic data (generated data) for each task. Table 1 shows the network structures for each layer in the critic and the generator. N time and N ch are the numbers of the time samples (=300) and channels (=41), respectively. A bias term was added to the input and hidden layers. The activation function for the input and output layers in the critic was the identity function, f (x) = x. A sparse gradient causes a vanishing gradient in the critic, and thus a leaky rectified linear unit (ReLU): was selected as the activation function for the hidden layer. For the generator, the sigmoid function: was selected for the output layer, while the identity function and leaky ReLU were used in the input and hidden layers. Initial values for each unit in the generator and the critic were random variables from the normal distribution N (0, σ 2 θ ) with a mean 0 and variance σ 2 θ [27], where σ 2 θ is denoted as σ θ = m −1/2 using the number of units m. The weights between units in the critic and the generator were updated by adaptive moment estimation (Adam) [28]. The recommended parameter values for Adam are α = 0.001, β 1 = 0.9, β 2 = 0.999, and ε = 10 −8 [28]. In addition, the WGAN is trained by updating the weights in the critic five times after updating the weights in the generator once. This WGAN training was repeated 10 000 times, and the fNIRS data for all the trials with the training data were used for a single training session.

Datasets
In this study, we used the data from nine participants that were collected in a previous study [9]. The experiment was approved by the ethical committee of the Nagaoka University of Technology. Data were measured for the nine participants (one left-handed female and eight right-handed males; 21-26 years old) according to the declaration of Helsinki. All the participants submitted a document regarding informed consent. The participants performed a total of 80 trials, conducted in five sessions with 16 trials per session. One trial consisted of a simple block design, including a 5 s pre-rest, 10 s task, and 15 s post-rest. During the task period, the participant either rested or conducted a left-, right-, or bimanual motor task (figure 2(a)). Thus, four different task conditions were set. The brain activity was measured using a multichannel fNIRS system (OMM-3000, Shimadzu Corp., Kyoto, Japan) and its sampling period was set to 100 ms. The measured 41 channels covered the sensorimotor areas in both hemispheres ( figure 2(b)) so that the center of the source probe surrounded by CH19, 25, 26, and 32 was located at Cz of the international 10-20 system. We also recorded the position of the probes and reference points (nasion, left ear and right ear) with a stylus marker (FASTRAK; Polhemus, Colchester, VT, USA). From these data, we confirmed that the measured channels covered sensorimotor areas (see appendix for details). Note that actual measurements were performed for 45 channels including 4 short-distance channel probes (short channels) [8,9]; however, we only used 41 normal fNIRS channels. In the previous study, the short channels were used to reduce the effects of artifacts in the superficial layers or scalp [29]. However, to focus on the effects of augmentation, the short channels and reduction of scalp artifacts were not used in the present study.

Preprocessing
Based on the procedure used in previous studies [8,9], the measured fNIRS data were preprocessed as follows. First, we performed detrending based on a discrete cosine transform (cut-off period: 60 s). Second, baseline correction was conducted so that the average during the 5 s pre-rest period became 0. Finally, smoothing using a moving average (time window: 3 s) was conducted. The preprocessed time series data were divided into trial data and used as inputs to the classifiers to examine the effects of data augmentation on classification accuracy. The analysis was conducted using custom Python scripts with Chainer (https:// chainer.org), Scikit-learn (https://scikit-learn. org), and SciPy (https://www.scipy.org) for the data augmentation and the classification.

Classification
Next, the effects of data augmentation on classification accuracy in fNIRS-BCIs were evaluated. To see if the effects of data augmentation was dependent on the type of classifier, we used support vector machines and neural networks [30] to classify the following four types of fNIRS data: left-hand grasping, right-hand grasping, bimanual grasping, and resting.

Cross-validation
The accuracy was defined as the number of correctly classified trials divided by the number of trials. This was evaluated by ten-fold cross-validation where all the data were randomly divided into ten sets and one set (10% of the data) was used as test data, while classifiers were trained using eight sets (80% of the data, referred to as the training data), and hyper-parameters (kernel types, penalty parameters for support vector machine (SVM), and the number of epochs for neural networks (NN); see below) were determined with the remaining one set (referred to as the validation data). The generated data were synthesized from only the training data in the cross-validation set. Using the generator of fNIRS-GANs, 100 trials of the generated data for each task were synthesized (400 trials in total). These generated data were combined with the original training data and used as augmented data ( figure 1(b)). To see the effects of data augmentation, we compared the accuracy obtained from the classifiers trained with only the original fNIRS data and with the augmented data. The statistical significance was examined by a paired t-test with the threshold set to 5% (p = 0.05). Note that for both original and augmented data, the test data was identical but the training data was changed.

Support vector machines
For SVM classification, we combined binary classifiers into a binary decision tree [31,32] used in previous studies [8,9] and performed a four-class classification.
The combination of kernel types (linear or Gaussian), penalty parameters, and Gaussian parameters were determined by a grid search. For the linear SVM, a penalty parameter C was searched from 2 −20 to 2 0 with intervals of 2 0.5 . For the Gaussian SVM, the penalty parameter C was searched from 2 0 to 2 10 with intervals of 2 0.5 , and the Gaussian parameter from 2 0 to 2 10 with intervals of 2 0.5 . The parameters that showed the highest accuracy for the validation data were selected as the optimal parameters, which were thus used for test data.

Neural networks
In addition to SVMs, we used fully connected feedforward NNs and calculated their accuracies. We tested NNs with one hidden layer, and multi-class classification was adopted as the classification model. For the activation function of each unit, the sigmoid, identity, and softmax functions were selected for the hidden, input, and output layers, respectively. The number of units in the hidden layers was set to 300 (+1 for bias) per layer. The initial value for each weight between units was set to a random variable sampled from the normal distribution, which was the same as used in the WGAN (see 2.1.3). In the training epoch, the cross-entropy error function was defined as follows: where N is the number of the training samples (trials), K is the number of classes, t k is the target (label) data, and y k is the output of the NN. The weight w was updated so that the cross-entropy error function was minimized using the gradient descent: where t is the time step (i.e. epoch), η is the learning coefficient set to 0.01, and g is the gradient of the weight. During training, the NN iterated epochs by gradient descent using all the training data, and the classification model was evaluated by the validation data. The training iteration was stopped when the cross-entropy error function for the validation data remained the same for ten successive epochs or when 10 000 iterations were finished. Figure 3 shows the averaged waveforms across trials of the measured (original) and generated data for the left-hemisphere (CH24) and right-hemisphere (CH27) of a representative participant (participant A). To compare both data, we used 20 sets of generated data from each task and averaged across 20 trials for each dataset. The generated data showed a similar temporal form compared to the original fNIRS data. For example, increased activities for the channel contralateral to the moving left hand were observed (CH27 for the left-hand task). Small but similar activity patterns were observed at CH24 for the righthand task. In addition, the amplitude was large for the difficult bimanual task. On the other hand, the generated data included extra noise components as indicated by the large variances.

Classification accuracy
Next, we conducted four-class classification analysis of both the measured and generated data using SVMs and NNs. Figure 4 shows the mean classification accuracy across participants calculated by ten-fold cross-validation. The number of generated datasets was 100 for each task, while 16 original datasets for  . Trial-averaged waveforms for right-, left-, and bimanual movement tasks as well as resting in the representative channels (upper panels: CH #24; lower panels: CH #27) in a cross-validation hold. The red lines denote measured original data and the blue lines denote generated data using WGANs. The shaded area represents 95% confidence intervals. each task were used. The classification accuracy greatly improved after data augmentation for both SVMs (0.733) and NNs (0.746), whereas the accuracies for the original data were lower (approximately 0.4; figure 4). The training data using our proposed WGANs increased to approximately 0.7 for both classification methods. The differences between accuracies using both classification methods were comparable but slightly larger for NN than SVM. This suggests that useful data can be acquired using the proposed data augmentation method.

Effects of the number of augmented data
We examined the effects of the number of augmented data on improving accuracy by changing the number of the augmented data used for training. Figure 5 shows the accuracy changes with increasing numbers of augmented data for each task (class) for both classification methods. In NNs, we found that the accuracy improved largely with increases of 20 samples, and gradual accuracy improvements were observed when the numbers of augmented samples were larger than 20. The highest accuracy improvement was found when the number of augmented samples was 60. While, on average, the accuracy of SVMs gradually increased with increasing numbers of augmented samples, accuracies for all participants seemed to improve by adding 40-60 samples.

Discussion
In this study, we aimed to improve the accuracy of fNIRS-BCIs and proposed a fNIRS data augmentation method using WGANs. The classification accuracy significantly improved after data augmentation for real fNIRS data irrespective of the classifiers (SVM or NN), which indicates that the data generated by the proposed method might be useful.

WGAN, GAN, and other augmentation methods
In this study, we used the WGAN-GP method for data augmentation [26]. WGAN-GP is known to have better convergence properties compared to normal GAN. In fact, we found that parameter tuning was very difficult when using GANs for fNIRS data [24]. Therefore, incorporating WGAN-GP is necessary for stable data augmentation. We would like to note that we have not tested other data augmentation methods other than GAN and WGAN-GP. Simple data augmentation methods such as flipping, changing color, and adding noise are effective for image analysis (e.g. see [10]). In EEG analysis, similar simple data augmentation [13], concatenation of the segments from different trials [11,12], and empirical mode decomposition [14,15] has showed improvements of BCI accuracies. Moreover, the GAN-based method has been proposed for time series data [33]. However, it is unclear what kind of manipulation is useful because fNIRS signal properties differ from the image properties and EEG data; therefore, we should examine the effects of simple augmentation techniques as well as more sophisticated methods on fNIRS signal properties in the future.

Number of samples
One of the most important aspects for performance improvement of data augmentation for fNIRS-BCI is the number of augmented samples for classifier training. It is expected that GANs generate similar data and that overfitting of the training data occurs if little original training data is available. The accuracy improvement is highest when the sample number is approximately 60 for NNs ( figure 5(b)) and 80 for SVMs ( figure 5(a)), which suggests a limitation for the improvement for our data (16 original training samples for each class). Thus, like the previous reports of EEG data augmentation [13,15,21], we confirmed that the augmented data quality and accuracy improvement depend on the number of samples. It should be noted that there is a possibility that GAN-based data augmentation can exacerbate bias present in the original training data [34]. We believe that there was no bias for the class, individuals, gender, or handedness in our study because our data augmentation was done for each individual subject data and each class. Nevertheless, the relationship between data quality, sample number, and the effects of biases contained in the data must be explored further.

Future work
Currently, we only tested data obtained from a previous study. Applying this method to additional fNIRS data is necessary to elucidate the characteristics of the data augmentation method and fNIRS signals, as well as the process of superficial-layer artifact removal, which is crucial for fNIRS data analysis [29,35]. In addition, other classifiers and features should be tested in the future. The classifiers used in this study were simple NNs and SVMs, and although other classifiers should be tested in the future, this is beyond the scope of the present study. However, we postulate that other types of complex neural networks (e.g. convolutional neural networks [6]) can be combined with our proposed data augmentation. Similar to the selection of the classifiers, many previous studies have tested different types of features to be classified, including mean, slope [36], and variances [37] (for review, see [3]), while we used time-series data [4,8,9] to investigate the effect of data augmentation. These should also be examined in future work.

Conclusion
Here, we have proposed a GAN-based data augmentation method for fNIRS-BCIs. Our results have demonstrated that the accuracy of fNIRS-BCIs can be improved using our data augmentation method. While there are many issues to be resolved, including parameter settings, GAN structures, and a more sophisticated classification algorithm efficacy, this study indicates the feasibility of a GAN-based method for BCIs.