Waveform-Coded Steady-State Visual Evoked Potentials for Brain-Computer Interfaces

This study presents a novel waveform-coding method for multi-target steady-state visual evoked potential (SSVEP)-based brain–computer interfaces (BCIs). Three periodic waveforms including square, sawtooth, and sinusoidal waves at various frequencies and initial phases were employed to elicit discriminable SSVEPs. A virtual keyboard was first designed using 36 visual stimuli modulated by the combinations of different frequencies, phases, and waveforms. With the virtual keyboard, 13 healthy participants performed offline and online BCI experiments with a cue-guided spelling task. The task-related component analysis (TRCA)-based algorithm was used to identify a target visual stimulus. The offline results showed that the visual stimuli tagged with different properties could accurately be identified by analyzing the elicited SSVEPs. Moreover, the online spelling task achieved promising performance with an averaged information transfer rate (ITR) of 62.6 ± 32.5 bits/min. This study validated the feasibility of implementing a multi-command SSVEP-based BCI using the hybrid waveform-, frequency- and phase-coding method. The proposed waveform-coding method provides a completely new channel for multi-target stimulus coding, expanding the research fields of an SSVEP-based BCI.


I. INTRODUCTION
S TEADY-STATE visual evoked potentials (SSVEPs) are the electroencephalographic (EEG) responses to flickering visual stimulation. In the human visual cortex, the activation of neurons synchronizes to the flickering of the visual stimuli, resulting in an SSVEP characterized by a sinusoidallike waveform at the stimulus frequency and its harmonics [1], [2]. The frequency components in SSVEPs are nearly stationary, and therefore the stimulus frequency can be reliably recognized by analyzing SSVEPs in the frequency domain [3]. Due to the robust frequency characteristics of SSVEPs, the frequency tagging technique, which encodes multiple visual stimuli with different flickering frequencies, has been widely used in implementing multi-command brain-computer interfaces (BCIs) [4]. In an SSVEP-based BCI, an user gazes at one of multiple visual stimuli tagged with different flickering frequencies, and the target stimulus, which the user is gazing at, can be identified through analyzing the recorded SSVEPs [5]. In this way, the system can indirectly translate users' intentions into commands tagged with the target stimulus for controlling external devices. With its advantages of little user training, ease-of-use, and high information transfer rate (ITR), the SSVEP-based BCI has received increasing attention.
It has been a main challenge in a practical SSVEP-based BCI to increase the number of visual stimuli without compromising the discriminability (i.e., classification accuracy) of the SSVEPs elicited by different stimuli [6]. In the past decade, many researchers have attempted to solve this problem by designing advanced visual stimulation. The multi-step approaches, which require multiple selections to input a final command, have succeeded in implementing more possible commands than the number of visual stimuli (i.e., available frequencies) [7]- [9]. For example, Ceocotti demonstrated a 27-command SSVEP speller using 3 visual stimuli with 3layer selections (i.e., 3 3 = 27) [7]. Another approaches such as the multiple frequencies sequential coding (MFSC) [10] and the frequency shift keying (FSK) [11] made it possible to render a large number of visual stimuli tagged with code words consisting of the sequences of distinct frequencies.
In addition to the traditional frequency-coding method, a phase-coding method has also been popularly used to modulate visual stimuli in an SSVEP-based BCI [5], [12]. In particular, the effectiveness of hybrid frequency and phase coding methods in improving the discriminability of SSVEPs has been demonstrated in several studies [6], [13], [14]. Indeed, the SSVEP-based BCI that marks the highest ITR of 325 bits/min to date employed the joint frequency-phase modulation (JFPM) approach to enhance target identification accuracy [15].
In target identification, harmonic components can be considered as important features to characterize SSVEPs as well as its fundamental frequency and phase properties. In fact, it has been well demonstrated that the use of harmonics as features in classification significantly enhances its accuracy [16]- [18]. However, sparse studies have explored the importance of incorporating harmonic components into visual stimulation. In an SSVEP-based BCI, visual stimuli are generally modulated by square or sinusoidal waves, which would produce different linear combinations of fundamental and harmonic frequency components. The latest comparison study revealed that employing square waves could lead to a better performance than sinusoidal waves due to their distinct harmonic responses [19]. Our preliminary study indicated that visual stimuli modulated by different waveforms indeed elicited SSVEPs with unique frequency responses which were discriminable from each other [20]. However, the effectiveness of waveform-coding method has yet to be validated in an online BCI experiment. Furthermore, it still remains unknown whether the waveform-coding method can be combined with hybrid frequency-and phase-coding methods.
This study aimed at validating the applicability of the waveform-coding method integrated with a hybrid frequency-and phase-coding method to an online BCI system. We designed a virtual keyboard with 36 visual stimuli, which were modulated by the combinations of different frequencies, phases and waveforms. Three periodic functions including sinusoidal, square, and sawtooth waves were employed in this study. Target identification is done by a stateof-the-art method based on task-related component analysis (TRCA) [15] in this study. With the visual stimuli, an offline experiment was first conducted to collect SSVEP data for estimating the classification accuracy. A cue-guided online BCI experiment was then conducted to validate the feasibility of an online SSVEP-based BCI using the waveform-coding method.

A. PARTICIPANTS
Nine males and four females (mean age: 21.9 ± 1.2 years) with normal or corrected-to-normal vision took part in our experiment. This study was approved by the research ethics committee of Tokyo University of Agriculture and Technology. All the participants were given an informed consent before participating in the experiment.

B. WAVEFORM-BASED STIMULUS CODING
This study employs different periodic waveforms including square, sinusoidal, and sawtooth waves to modulate multiple visual stimuli. The luminance changes of the stimuli can be modulated by stimulus sequences whose dynamic range is [0, 1], where 0 and 1 represent black and white, respectively. The stimulus sequences based on square s square (f, i, φ), sinusoidal s sin (f, i, φ) and sawtooth s saw (f, i, φ) waves at a stimulus frequency f and an initial phase φ can be generated by the following equations: where i indicates the frame index and f r indicates the refresh rate of a monitor. In addition, the square[·], sin[·], and sawtooth[·] are the functions that generate square, sinusoidal and sawtooth waves ranging from -1 to 1, respectively. Fig. 1 illustrates the concept of waveform-coding method using the waveforms described in (1), (2), and (3).

C. STIMULUS DESIGN
Thirty-six target visual stimuli were presented on a ViewPixx 3D 23-inch liquid crystal display (LCD) screen (VPixx Technologies, Inc.) with a resolution of 1,920 × 1,080 pixels and a refresh rate of 120 Hz. As shown in Fig. 2(a), the visual stimuli were arranged in a 4 × 9 matrix as a virtual keyboard with 26 English alphabet letters and 10 other symbols. Each stimulus was rendered within a 4.8-× 4.8-cm square with an interval between two neighboring stimuli of 1.15 cm. Each stimulus was modulated by a combination of different frequencies (7.5, 10.0, and 12.0 Hz), initial phases (0, 0.5 π, 1.0 π, and 1.5 π rad), and three waveforms (square, sinusoidal, and sawtooth waves) as shown in Fig. 2(b). The stimulus program was developed under MATLAB (MathWorks, Inc.) using the Psychphysics Toolbox Version 3 [21].

E. EXPERIMENTAL TASK
The experiment in this study consisted of an offline and an online stages. All the participants completed the two stages on the same day. The offline experiment was first conducted to collect individual training data used to optimize subject-specific parameters and to calibrate target identification method in the following online stage. The online experiment was then conducted to evaluate the performance of the proposed BCI system. Through the experimental procedure, the participants seated in a comfortable chair in front of the LCD screen.

1) Offline Stage
The offline stage consisted of 15 blocks. In each block, participants were asked to gaze at one of the visual stimuli for 3 s, and completed 36 trials corresponding to all 36 stimuli. Each trial started with a visual cue indicating a target stimulus, which a participant was supposed to gaze at. participants could start stimulation by pressing a "space" key on a keyboard whenever they were ready after shifting their gaze to the target stimulus. To avoid ocular artifacts, participants were asked to avoid eye movements and blinks during the stimulation period. There was a short break for several minutes between two consecutive blocks to avoid visual fatigue.

2) Online Stage
In the online stage, participants completed a cue-guided spelling task. The online stage consisted of nine blocks, in which participants were instructed to gaze at one of the visual stimuli indicated by the stimulus program. The stimulus program randomly chose one of the following three sentences: 1) "THE QUICK BROWN FOX", 2) "JUMPS OVER" and 3) "THE LAZY DOG" at each block, and each participant completed three blocks for each sentence. In the cue-guided task, the stimulus duration was fixed to d s, which was optimized for each participant in the offline stage, with 1s gaze shifting time. After target selection, visual feedbacks were provided to the participants in real time. The TRCAbased method described below was used to identify target stimuli. The data recorded in the offline stage were used as training data to calibrate the target identification algorithm for each participant.

1) Spatial Filtering
This study used the ensemble TRCA-based spatial filtering to remove background noises and/or artifacts [15]. TRCA is the method that extracts task-related components efficiently by maximizing the correlation among EEG signals during task periods [23]. Here, two source signals are assumed: taskrelated signal s(t) ∈ R and task-unrelated signal n(t) ∈ R. A linear generative model of observed multichannel EEG signal x(t) = (x j (t)) ∈ R Nc is assumed as: x j (t) = a 1,j s(t) + a 2,j n(t), j = 1, 2, . . . , N c where j is the index of channels, N c is a number of channels, and a 1,j and a 2,j are mixing coefficients that project the source signals to the EEG signal. The problem is to recover the task-related component s(t) from a weighed sum of observed EEG signal x(t) described as: (w j a 1,j s(t) + w j a 2,j n(t)) .
Ideally, TRCA finds a solution of Nc j=1 w j a 1,j = 1 and Nc j=2 w j a 2,j = 0, leading to final solution y(t) = s(t). This probrem can be solved by the inter-trial covariance maximization. Let x (h) (t) and y (h) (t) be the h-th trial of EEG signal and the estimated task-related component, re-  spectively, the covariance between the h 1 -th and the h 2 -th trials of y ∈ R Ns is described as: where j 1 and j 2 are the index of channels of the h 1 -th and the h 2 -th trials data, respectively, N s is the number of sampling points in a trial, and Cov(a, b) is the covariance between vectors a and b. All possible combination of trials are summed as: where the matrix S = (S j1j2 ) 1≤j1,j2≤Nc is defined as: To obtain a finite solution, the variance of y(t) is constrained as: This constrained optimization problem can be solved as: The optimal weight vector is obtained as the eigenvector of the matrix Q −1 S corresponding to the largest eigenvalue. In an SSVEP-based BCI, TRCA can be used to obtain spatial filters for removing noises and spontaneous EEG activities.

2) Classification
This study employed a template-matching-based classification method, which uses correlation coefficients between individual templates and ongoing EEG signals as features [6], [14], [15], [24]. The filter-bank analysis was also integrated to decompose SSVEPs into sub-band components so that independent information embedded in the harmonic components can be extracted efficiently [18]. Let χ (m) ∈ R Nn×Nc×Ns×Nt be a individual calibration data and singletrial test data X (m) ∈ R Nc×Ns of m-th sub-band, where N n is the number of stimuli, and N t is the number of trials. The first step to classify SSVEPs is to obtain spatial filters for the n-th stimulus and m-th sub-band w (m) n through applying the aforementioned TRCA for the individual calibration data of n-th stimulus χ (m) n . The next step is to obtain an ensemble spatial filter W (m) ∈ R Nc×N f by concatenating the spatial filters obtained from all the stimuli w (m) n for each sub-band as follows: Then, the correlation coefficients between single-trial test data X (m) and templates (i.e., averaged calibration data across trials) for the n-th stimulus χ (m) n ∈ R Nc×Ns is calculated by the following equation: where ρ(a, b) indicates the PearsonâȂŹs correlation analysis between two signals a and b. Then, a weighted sum of squares of the combined correlation coefficients corresponding to all harmonic components was calculated by the following equation: where N m is the total number of harmonics and a(m) is defined as a(m) = m −1.25 + 0.25 according to [18]. Finally, target class k is identified by the following equation:

G. PERFORMANCE EVALUATION
The performance of the proposed waveform-coding method was first evaluated by classification accuracies using the offline dataset. Data epochs comprising 18-channel SSVEPs were extracted according to event triggers generated by the stimulus program. Considering a latency delay in the visual system [25], the data epochs were extracted in [0.1 s 0.1 + d s], where time zero indicated the stimulus onset and d indicated data length used in the analysis. All data epochs were down-sampled to 250 Hz. The classification accuracy was estimated using five-fold cross validation with different data lengths, d, from 0.2 to 3.0 s with an interval of 0.2 s. The number of sub-bands (i.e., N m ) was set to seven in the analysis. In each of five rounds, target identification was performed using 12 blocks for training and three blocks for testing. In addition to the 36-class target identification accuracy, the accuracy corresponding to frequency-only classification (three classes), phase-only classification (four classes) and waveform-only classification (three classes) were also calculated separately. In the online analysis, the BCI performance for each participant was also evaluated by ITR [26]: where P is the classification accuracy, and T [s/selection] is the average time for a selection. In the online experiment, the stimulation length (i.e., data length, T ) was selected for each participant toward his/her comfort. Fig. 3 shows the classification accuracy for each individual participant and its average with different data lengths from 0.2 to 3.0 s with an interval of 0.2 s. Fig. 3(a) reveals that the averaged accuracy in the mixed frequency, phase and waveform classification was significantly higher than its chance-level accuracy (i.e., 2.78%) regardless of data lengths. In addition, the classification accuracy of each property exceeded its own chance-level accuracy, although there was a gap in the accuracy between the frequency/phaseonly classifications and the waveform-only classification. Note that, the chance-level accuracy is different for each property (i.e., frequency-only classification: 33.33%, phaseonly classification: 25.00%, and waveform-only classification: 33.33%, respectively). Fig.3(b) shows that all the participants achieved significantly higher classification performance than its chance-level accuracy regardless of the classification properties even with the shortest data length. Table 1 lists the results of the online cue-guided BCI experiment. The stimulus duration was selected for each participant toward their best comfort. The shortest and longest stimulus durations were 0.3 s (s4) and 1.5 s (s8 and s9), respectively. On the other hand, the gaze shifting duration was fixed to 1 s for all the participants. The averaged ITR across participants was 62.7 ± 32.5 bits/min. Across individuals, the minimum and maximum ITRs were 10.4 bits/min (s12) and 121.6 bits/min (s4), respectively.

IV. DISCUSSIONS
Multiple stimulus coding plays an important role in designing an SSVEP-based BCI for various applications. In the present study, the waveform-coding method was proposed as an alternative channel to elicit discriminable SSVEPs. The results showed that the three different waveforms were able to be identified accurately by analyzing the elicited SSVEPs. The waveform-coding was also successfully integrated with the mixed frequency and phase coding method [6], [13], leading to significantly increased number of visual stimuli. Although the feasibility of the waveform coding has been proven ( Fig.3), it was also revealed that there was a large individual difference in the classification accuracy. It should be noted that the individual difference was seen not only in the waveform-only classification but also in the frequencyonly and the phase-only classifications. Fig.4 depicts the relationships among the classification accuracy corresponding to the three coding properties with 0.2-s data epochs. In the figure, each dot indicates the classification accuracy for an individual participant. The accuracy of waveform-only classification was significantly associated with the ones of frequency-only classification (R 2 = .910, p < .001) and of phase-only classification (R 2 = .860, p < .001).   This study also validate the effectiveness of the proposed waveform-coding method via the online BCI experiment. The averaged ITR obtained in this study was 62.7 bits/min, which was not as high as the ones reported in the previous studies of high-speed BCIs (e.g., 267 bits/min and 325 bits/min reported in [14] and [15], respectively). This might be because the waveform-coding requires relatively longer data length to achieve the equivalent level of accuracy to the frequency-and phase-coding methods as shown in Fig.3. Another explanation is that the present study used 1-s gaze shifting time, which is longer than the previous studies (i.e., 0.5 s) [14], [15]. Depending on the gaze shifting time, a drastically different online ITR would be obtained. For example, if the gaze shifting time could be shorten to 0.5 s, s4 would achieve an ITR of 197.7 bits/min, which is significant improvement from the present one of 121.6 bits/min. In that sense, it is of importance to conduct sufficient training sessions, in which users could get familiar with the user-interface of the system, toward maximizing the online BCI performance. Importantly, the experiment confirmed the feasibility of implementing an online BCI system with the proposed waveform-coding method.
The proposed waveform-coding method could be generalized by employing flexible combinations of fundamental and harmonic frequency components in stimulus sequences. The periodic functions used in this study (i.e., sinusoidal, square, and sawtooth waves) are the specific examples of them. By optimizing the mixing coefficients of fundamental and harmonic components, the BCI performance could further be enhanced compared with using the periodic functions. To this end, a novel and systematic method for the parameter search needs to be proposed. In addition, there might still be a room for improvement in target identification algorithms.
This study employed the ensemble TRCA-based method to analyze the waveform-coded SSVEPs since it has shown the greatest performance in the previous literature of an SSVEPbased BCI [15], [24], [27]. To precisely capture fine-tuned fundamental and harmonic components in SSVEPs, more suitable algorithms than the TRCA-based methods, which might be model-based ones, need to be developed.

V. CONCLUSION
The waveform-coding method was introduced as a novel approach for designing a multi-command SSVEP-based BCI in this paper. The offline and online BCI experiments were conducted using the visual stimuli modulated by the mixed frequency-, phase-, and waveform-coding method. The classification accuracy obtained from the offline data revealed that the three coding properties including the waveform could be reliable detected by using the TRCA-based target identification algorithm. The feasibility of implementing online applications using the proposed stimulus-coding method was also validated in the online experiment with the cue-guided spelling task. Since the proposed waveform-coding method is completely a new way to elicit SSVEPs, this study will expand the research field of an SSVEP-based BCI, encouraging more BCI applications requiring a large number of commands.